Re: BUG #14399: Order by id DESC causing bad query plan

From: Jamie Koceniak <jkoceniak(at)mediamath(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14399: Order by id DESC causing bad query plan
Date: 2016-11-02 15:31:12
Message-ID: E1ECB3A1-ECD9-4694-AE32-DD13C9CC9E26@mediamath.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi David,

Thanks for the suggestion on rewriting the query.
Unfortunately, it yields the same performance problem.

<https://explain.depesz.com/s/fab>https://explain.depesz.com/s/fab

Query rewritten (sorry did leave out function call on original email):
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE
t2.id<http://t2.id> EXISTS (select 1 from valid_customers(15348) t3 where t3.customer_id = t2.id)
ORDER BY t1.id<http://t1.id> DESC
LIMIT 10 ;

We also do pagination and if you add a limit 10 offset 90 for example, the performance is 10 times as worse.

If you actually sort by a non-indexed field, then the query runs in 37ms.
Here is the query plan using non-indexed field:
https://explain.depesz.com/s/BtF2

So query sorted by non-indexed field looks like:

select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE
t2.id<http://t2.id> EXISTS (select 1 from valid_customers(15348) t3 where t3.customer_id = t2.id)
ORDER BY t1.<http://t1.id>created_on desc
LIMIT 10 ;

Thanks,
Jamie

From: David Johnston <david(dot)g(dot)johnston(at)gmail(dot)com<mailto:david(dot)g(dot)johnston(at)gmail(dot)com>>
Date: Tuesday, November 1, 2016 at 4:41 PM
To: Jamie Koceniak <jkoceniak(at)mediamath(dot)com<mailto:jkoceniak(at)mediamath(dot)com>>
Cc: "pgsql-bugs(at)postgresql(dot)org<mailto:pgsql-bugs(at)postgresql(dot)org>" <pgsql-bugs(at)postgresql(dot)org<mailto:pgsql-bugs(at)postgresql(dot)org>>
Subject: Re: [BUGS] BUG #14399: Order by id DESC causing bad query plan

On Thu, Oct 27, 2016 at 5:16 PM, <jkoceniak(at)mediamath(dot)com<mailto:jkoceniak(at)mediamath(dot)com>> wrote:
The following bug has been logged on the website:

Bug reference: 14399
Logged by: Jamie Koceniak
Email address: jkoceniak(at)mediamath(dot)com<mailto:jkoceniak(at)mediamath(dot)com>
PostgreSQL version: 9.4.6
Operating system: Linux
Description:

One table has 2M records (orders) joining to another table with 75K records
(customers).

Query:
select * FROM
orders t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE
t2.id<http://t2.id> IN (select distinct customer_id from valid_customers)
ORDER BY t1.id<http://t1.id>
LIMIT 10 ;

​Bug potential aside the better way to write ​that is to use a proper semi-join (i.e., EXISTS)

SELECT *
FROM order t1
JOIN customer t2 ON (t1.customer_id = t2.id<http://t2.id>)
WHERE EXISTS (SELECT 1 FROM valid_customers t3 WHERE t3.customer_id = t2.id<http://t2.id>)
ORDER BY t1.id<http://t1.id>
LIMIT 10;

Note too that your query plan has a "function scan" node unlike what your query implies...

Sorry I can't be of more help with the information you've provided.

David J.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-11-02 15:52:16 Re: Problems with "pg.dropped" column after upgrade 9.5 to 9.6
Previous Message Tom Lane 2016-11-02 14:35:04 Re: collector's time is wrong.(year 2038 problem)