Quick Links

Re: Index is not used for "IN (non-correlated subquery)"

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	George <pinkisntwell(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Index is not used for "IN (non-correlated subquery)"
Date:	2016-11-30 17:08:39
Message-ID:	CAHyXU0wb7Bte5SNT+f1WxeZbTMMjoBVyJvjLZZQTq4KaoG7Bxw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Wed, Nov 30, 2016 at 11:05 AM, George <pinkisntwell(at)gmail(dot)com> wrote:
> On Wed, Nov 30, 2016 at 6:45 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Wed, Nov 30, 2016 at 10:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> George <pinkisntwell(at)gmail(dot)com> writes:
>>>> explain select * from wg3ppbm_transaction where partner_uuid in (
>>>> select p.uuid
>>>> from wg3ppbm_userpartner up
>>>> join wg3ppbm_partner p on p.id = up.partner_id
>>>> );
>>>
>>>> "Hash Semi Join (cost=2.07..425.72 rows=2960 width=482)"
>>>> " Hash Cond: ((wg3ppbm_transaction.partner_uuid)::text = (p.uuid)::text)"
>>>> " -> Seq Scan on wg3ppbm_transaction (cost=0.00..375.19 rows=5919 width=482)"
>>>> " -> Hash (cost=2.06..2.06 rows=1 width=37)"
>>>> " -> Nested Loop (cost=0.00..2.06 rows=1 width=37)"
>>>> " Join Filter: (up.partner_id = p.id)"
>>>> " -> Seq Scan on wg3ppbm_userpartner up
>>>> (cost=0.00..1.01 rows=1 width=4)"
>>>> " -> Seq Scan on wg3ppbm_partner p (cost=0.00..1.02
>>>> rows=2 width=41)"
>>>
>>> This plan is expecting to have to return about half of the rows in
>>> wg3ppbm_transaction, a situation for which an indexscan would NOT
>>> be a better choice. The usual rule of thumb is that you need to be
>>> retrieving at most one or two percent of a table's rows for an indexscan
>>> on it to be faster than a seqscan.
>>>
>>> I think however that the "half" may be a default estimate occasioned
>>> by the other tables being empty and therefore not having any statistics.
>>> Another rule of thumb is that the plans you get for tiny tables have
>>> little to do with what happens once there's lots of data.
>>
>> Yeah, don't make query plan assumptions against empty or nearly empty
>> tables. As the data grows, the plans will suitably change. Perhaps
>> OP just recently loaded a bunch of data and the tables haven't been
>> analyzed yet?
>
> I just added a significant number of rows to the table. I now have
> 1.3M rows in total but only 8K rows that contain the value I am
> seeking. I also ran ANALYZE after loading the data. The query plans
> for the two queries did not change. Also, the simple query returns in
> 45 ms while the one with the subquery needs 1.5 s, i.e. it is about
> 30x slower.
>
> So there is definitely something wrong here. This situation makes many
> row-level security use cases cumbersome since you need to have
> almost the same WHERE clause both in the row-level security policy and
> in every SELECT query in order for the index to be used.

can you give EXPLAIN ANALYZE for the 'good' query and the 'bad' query?

merlin

In response to

Re: Index is not used for "IN (non-correlated subquery)" at 2016-11-30 17:05:18 from George

Responses

Re: Index is not used for "IN (non-correlated subquery)" at 2016-11-30 18:44:33 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	David G. Johnston	2016-11-30 18:15:58	Re: select function alias
Previous Message	George	2016-11-30 17:05:18	Re: Index is not used for "IN (non-correlated subquery)"