Re: query planner placement of sort/limit w.r.t. joins

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Dave Vitek <dvitek(at)grammatech(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: query planner placement of sort/limit w.r.t. joins
Date: 2017-04-29 01:53:31
Message-ID: CAKJS1f_Y8OqN2G_nhjqDXFNMPsC=Qc6KZMANJq5ct_TkPz5udg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 29 April 2017 at 11:37, David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>> > Perhaps there are reasons why this optimization is not safe that I
>> > haven't
>> > thought about?
>>
>> Yeah, I think so. What happens if an A row cannot find a match in B or
>> C? This version of the query will end up returning fewer rows due to
>> that, but the original version would consider other rows with a higher
>> rank.
>>
>> We've danced around a bit with using foreign keys as proofs that rows
>> will exist for other optimisations in the past, but it's tricky ground
>> since foreign keys are not updated immediately, so there are windows
>> where they may not actually hold true to their word.
>
>
> I read this query as having a relation cardinality of one-to-one mandatory -
> which precludes the scenario described.

What makes you say so?

It's pretty easy to show how the queries are not the same.

create table a (
id int primary key,
b_id int not null,
val int not null,
rank int not null
);

create table b (
id int primary key,
c_id int not null,
val int not null
);

create table c (
id int primary key,
val int not null
);
insert into a select x,x,x,x from generate_series(1,150) x;
insert into b select x,x,x from generate_series(51,150) x;
insert into c select x,x from generate_series(51,150) x;

SELECT A.val, B.val, C.val FROM A
JOIN B ON A.b_id = B.id
JOIN C ON B.c_id = C.id
ORDER BY A.rank
LIMIT 100; -- returns 100 rows

SELECT D.val, B.val, C.val FROM
(SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
JOIN B ON D.b_id = B.id
JOIN C ON B.c_id = C.id
LIMIT 100; -- returns 50 rows

> Is the above saying that, today, there is no planning benefit to setting up
> two deferrable references constraints to enforce the non-optional
> requirement?

There is no place in the planner where a foreign key is used as a
proof that a joined row must exist, with the exception of row
estimations for statistics.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Güttler 2017-04-30 11:37:02 Can PG replace redis, amqp, s3 in the future?
Previous Message Tomas Vondra 2017-04-29 00:41:19 Re: Questionaire: Common WAL write rates on busy servers.