Re: Puzzling table scan in a CTE

From: Elliot <yields(dot)falsehood(at)gmail(dot)com>
To: slapo(at)centrum(dot)sk, pgsql-general(at)postgresql(dot)org
Subject: Re: Puzzling table scan in a CTE
Date: 2013-11-22 17:06:25
Message-ID: 528F8F11.40100@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2013-11-22 11:54, slapo(at)centrum(dot)sk wrote:
>
> Good day,
>
> I have a recursive CTE where a table scan occurs, even though there
> doesn't seem to be a good reason for it.
>
> It seems the planner came to the conclusion that columns that are not
> actually used in the output, joins or a where clause are a part of the
> output.
>
> It's not a performance problem now and the query runs quickly (which
> is why I haven't posted it on the performance mailing list). What
> bothers me is that the scan seems to be there without being really
> necessary and I would like to avoid any extra I/O. I would expect the
> plan to be close the simplified query without CTE posted after the CTE
> query below.
>
> I'm also worried that this could get worse over time and that it might
> influence the query's performance.
>
> The question:
>
> Am I missing something and the table scan is necessary?
>
> If not, is there a way to avoid it?
>
> This is the query:
>
> WITH RECURSIVE user_subordinates
>
> AS
>
> (
>
> SELECT
>
> ur1."id" AS var_id,
>
> ur1.id_user_parent AS var_id_user_parent,
>
> ur1.login AS var_login,
>
> ur1_1.login AS var_login_parent,
>
> 1::smallint AS var_sort_order
>
> FROM
>
> "user" ur1
>
> JOIN
>
> "user" ur1_1
>
> ON(ur1.id_user_parent=ur1_1."id")
>
> WHERE
>
> ur1.id = 3970
>
> AND ur1.disabled=false
>
> UNION ALL
>
> SELECT
>
> ur2."id" AS var_id,
>
> ur2.id_user_parent AS var_id_user_parent,
>
> ur2.login AS var_login,
>
> ups1.var_login AS var_login_parent,
>
> (ups1.var_sort_order + 1)::smallint AS var_sort_order
>
> FROM
>
> user_subordinates ups1
>
> JOIN
>
> "user" ur2
>
> ON(ups1.var_id = ur2.id_user_parent
>
> AND ups1.var_id <> ur2.id
>
> )
>
> )
>
> SELECT
>
> var_id,
>
> var_id_user_parent,
>
> var_login,
>
> var_login_parent,
>
> var_sort_order
>
> FROM
>
> user_subordinates
>
> ORDER BY
>
> var_sort_order,
>
> var_id,
>
> var_id_user_parent
>
> ;
>
> This is its execution plan:
>
> http://explain.depesz.com/s/4hY
>
> This is a simplified version of the query without CTE:
>
> SELECT
>
> ur2."id" AS var_id,
>
> ur2.id_user_parent AS var_id_user_parent,
>
> ur2.login AS var_login,
>
> ups1.var_login AS var_login_parent,
>
> (ups1.var_sort_order + 1)::smallint AS var_sort_order
>
> FROM
>
> (
>
> SELECT
>
> ur1."id" AS var_id,
>
> ur1.id_user_parent AS var_id_user_parent,
>
> ur1.login AS var_login,
>
> ur1_1.login AS var_login_parent,
>
> 1::smallint AS var_sort_order
>
> FROM
>
> "user" ur1
>
> JOIN
>
> "user" ur1_1
>
> ON(ur1.id_user_parent=ur1_1."id")
>
> WHERE
>
> ur1.id = 3970
>
> AND ur1.disabled=false
>
> ) ups1
>
> JOIN
>
> "user" ur2
>
> ON(ups1.var_id = ur2.id_user_parent
>
> AND ups1.var_id <> ur2.id
>
> )
>
> ;
>
> Its plan is here:
>
> http://explain.depesz.com/s/Ak3
>
> Here's the table's DDL:
>
> CREATE TABLE "user"
>
> (
>
> id bigserial NOT NULL, -- Unique row identifier
>
> id_user_parent bigint NOT NULL DEFAULT 3, -- Identifier of user's
> parent user
>
> id_address bigint NOT NULL,
>
> login character varying NOT NULL, -- Login name of a user
>
> password character varying NOT NULL, -- Login password of a user
>
> date_created timestamp without time zone NOT NULL DEFAULT now(), --
> Date and time at which the record was created
>
> id_user_created bigint NOT NULL, -- Identifier of a user who created
> the row
>
> date_modified timestamp without time zone, -- Date and time at which
> the record was modified
>
> id_user_modified bigint, -- Identifier of a user who modified the row
>
> disabled boolean NOT NULL DEFAULT true, -- Its value is set to true
> when user account is disabled, false when enabled.
>
> activation_hash character varying NOT NULL, -- Activation hash that a
> hashed string sent to the user matches against during activation.
>
> invoiced boolean NOT NULL DEFAULT false, -- True is user wishes to be
> invoiced, false otherwise.
>
> CONSTRAINT user_pkey PRIMARY KEY (id),
>
> CONSTRAINT user_id_address_fkey FOREIGN KEY (id_address) REFERENCES
> address (id),
>
> CONSTRAINT user_id_user_created_fkey FOREIGN KEY (id_user_created)
> REFERENCES "user" (id),
>
> CONSTRAINT user_id_user_modified_fkey FOREIGN KEY (id_user_modified)
> REFERENCES "user" (id),
>
> CONSTRAINT u_login UNIQUE (login)
>
> );
>
> CREATE INDEX fki_user_id_address_fkey ON "user" USING btree(id_address);
>
> CREATE INDEX ix__user__id_user_parent ON "user" USING
> btree(id_user_parent);
>
> CREATE UNIQUE INDEX ix__user__login ON "user" USING btree
> (lower(login::text) COLLATE pg_catalog."default");
>
> CREATE UNIQUE INDEX ix__user__login__varchar_pattern_ops ON "user"
> USING btree (lower(login::text) COLLATE pg_catalog."default"
> varchar_pattern_ops);
>
> Environment: Windows 7 Professional x64, PostgreSQL 9.3.1. The table
> has been analysed before executing the query and getting the explain
> result. It's on my workstation and nobody else uses the database but me.
>
> Any thought on this are appreciated.
>
> Thank you.
>
> Peter Slapansky
>
You can "set enable_seqscan = off" to see what the planner thinks of
using the index on id_user_parent.
Does every user have a parent? Or a child? Those will affect how many
matches you'll find, and if it's a lot then it'll be faster doing the
table scan.
Also, have you changed any configuration settings? random_page_cost
might need some tuning.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message slapo 2013-11-22 17:49:59 Re: [GENERAL] Puzzling table scan in a CTE
Previous Message Alvaro Herrera 2013-11-22 17:05:44 Re: include all the postgres libraries (C)