Re: PG choosing nested loop for set membership?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Brian Crowell <brian(at)fluggo(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: PG choosing nested loop for set membership?
Date: 2014-03-28 20:02:02
Message-ID: CAMkU=1wRaHLC-MKBn7bpHX+Xt=cFbZ35DW9R_+4YdR7Ud6Y7Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Mar 25, 2014 at 2:00 PM, Brian Crowell <brian(at)fluggo(dot)com> wrote:

> Hello, it's me, a Postgres n00b again. I'm dealing with a query that
> scans a rather large table (94,000,000 tuples or so) and just picks
> out certain rows and sums them:
>
> select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0)
> as pl
> from dbo._pl_data_cache_intraday dci
> where dci.snapshot_time between '2014-03-25
> 11:32:40.004552-05'::timestamptz and '2014-03-25
> 12:02:40.015177-05'::timestamptz
> and dci.symbol in (select sec.symbol from dbo.security_underliers
> sec where sec.ultimate_underlier = 'SPY')
> and dci.manager = 'BJC'
> and dci.account in (select account from pl2.visible_accounts where
> is_fund)
> group by dci.snapshot_time
> order by dci.snapshot_time;
>
> For the most part, Postgres is doing the right thing: snapshot_time is
> the lead column in all of the table's indexes, so it's able to pick up
> the source rows fairly quickly in its index scan. It's also enforcing
> "dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
> "dci.symbol in (...)".
>
> The trouble comes when enforcing the "dci.account in (...)" search
> condition: pl2.visible_accounts is a view that determines which
> accounts the current user can see, which, depending on who you are,
> can be several hundred or none at all. Postgres estimates the output
> of this query as two rows, but in my case, it's actually 240.
>
> Unfortunately, that leads the query planner to try to think a nested
> loop is cheap enough to enforce this, when actually it's really
> expensive.
>

Can you show the explain plan for that? I can't get it to use anything but
a hash join for this type of thing even when the estimated rows in the
in-list are 2, unless I disable hash joins altogether. So I'm curious how
your plan differs from the ones I've dummied up.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David W Noon 2014-03-29 00:15:28 Re: EBCDIC conversion
Previous Message Sandeep Gupta 2014-03-28 19:59:25 Re: what is sublinks in query planner?