Re: Index is not used for "IN (non-correlated subquery)"

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: George <pinkisntwell(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Index is not used for "IN (non-correlated subquery)"
Date: 2016-11-30 17:08:39
Message-ID: CAHyXU0wb7Bte5SNT+f1WxeZbTMMjoBVyJvjLZZQTq4KaoG7Bxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Nov 30, 2016 at 11:05 AM, George <pinkisntwell(at)gmail(dot)com> wrote:
> On Wed, Nov 30, 2016 at 6:45 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Wed, Nov 30, 2016 at 10:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> George <pinkisntwell(at)gmail(dot)com> writes:
>>>> explain select * from wg3ppbm_transaction where partner_uuid in (
>>>> select p.uuid
>>>> from wg3ppbm_userpartner up
>>>> join wg3ppbm_partner p on p.id = up.partner_id
>>>> );
>>>
>>>> "Hash Semi Join (cost=2.07..425.72 rows=2960 width=482)"
>>>> " Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>>>> " -> Seq Scan on wg3ppbm_transaction (cost=0.00..375.19 rows=5919 width=482)"
>>>> " -> Hash (cost=2.06..2.06 rows=1 width=37)"
>>>> " -> Nested Loop (cost=0.00..2.06 rows=1 width=37)"
>>>> " Join Filter: (up.partner_id = p.id)"
>>>> " -> Seq Scan on wg3ppbm_userpartner up
>>>> (cost=0.00..1.01 rows=1 width=4)"
>>>> " -> Seq Scan on wg3ppbm_partner p (cost=0.00..1.02
>>>> rows=2 width=41)"
>>>
>>> This plan is expecting to have to return about half of the rows in
>>> wg3ppbm_transaction, a situation for which an indexscan would NOT
>>> be a better choice. The usual rule of thumb is that you need to be
>>> retrieving at most one or two percent of a table's rows for an indexscan
>>> on it to be faster than a seqscan.
>>>
>>> I think however that the "half" may be a default estimate occasioned
>>> by the other tables being empty and therefore not having any statistics.
>>> Another rule of thumb is that the plans you get for tiny tables have
>>> little to do with what happens once there's lots of data.
>>
>> Yeah, don't make query plan assumptions against empty or nearly empty
>> tables. As the data grows, the plans will suitably change. Perhaps
>> OP just recently loaded a bunch of data and the tables haven't been
>> analyzed yet?
>
> I just added a significant number of rows to the table. I now have
> 1.3M rows in total but only 8K rows that contain the value I am
> seeking. I also ran ANALYZE after loading the data. The query plans
> for the two queries did not change. Also, the simple query returns in
> 45 ms while the one with the subquery needs 1.5 s, i.e. it is about
> 30x slower.
>
> So there is definitely something wrong here. This situation makes many
> row-level security use cases cumbersome since you need to have
> almost the same WHERE clause both in the row-level security policy and
> in every SELECT query in order for the index to be used.

can you give EXPLAIN ANALYZE for the 'good' query and the 'bad' query?

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2016-11-30 18:15:58 Re: select function alias
Previous Message George 2016-11-30 17:05:18 Re: Index is not used for "IN (non-correlated subquery)"