From: | Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dani Oderbolz <oderbolz(at)ecologic(dot)de>, <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: [SQL] "SELECT IN" Still Broken in 7.4b |
Date: | 2003-08-21 21:07:08 |
Message-ID: | 20030821140139.C57728-100000@megazone.bigpanda.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-sql |
On Thu, 21 Aug 2003, Tom Lane wrote:
> Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> > Within the scope of the new hashed IN stuff I believe so in at least some
> > cases. I have a few million row table of integers where searching for
> > values IN (~10000 values) takes longer than creating the temp table,
> > copying into it and doing the in subquery.
>
> I did some profiling and soon realized that the main problem is the
> executor's trick for not returning the same row multiple times in a
> multiple-indexscan plan node. The point is that given
> WHERE a = 1 OR b = 1
> you could create a plan that first indexscans on a, then indexscans on
> b --- but you mustn't return any tuples in the second scan that you
> already returned in the first. IndexNext solves this by evaluating the
> prior-scan index conditions to see if they are true. Which is okay if
> there aren't too many of them. But when you have an N-element IN list
> this means you are going to do O(N^2) index expression evaluations.
> In the 10000-element IN-list test case, ExecQual gets called almost
> 50 million times :-(
>
> I'm toying with the notion of ripping out that logic and instead
> building an in-memory hashtable of already-returned TIDs. This could
> theoretically run out of memory if the multiple indexscan returns enough
> tuples, but I think in practice that wouldn't happen because the planner
> wouldn't choose an indexscan when very large numbers of tuples are being
> selected.
Well, if you want to be safer, I guess you could (at runtime) decide that
the table's gotten too big and fall back to the old method if you didn't
entirely rip it out. I'm not sure if that'd be too ugly though, but it
would mean that you wouldn't have to worry about it returning too many
tuples.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2003-08-21 21:16:50 | Re: Decent VACUUM (was: Buglist) |
Previous Message | Tom Lane | 2003-08-21 21:07:07 | Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12 |
From | Date | Subject | |
---|---|---|---|
Next Message | Manfred Koizar | 2003-08-21 21:18:04 | Re: [SQL] "SELECT IN" Still Broken in 7.4b |
Previous Message | Tom Lane | 2003-08-21 20:42:20 | Re: [SQL] "SELECT IN" Still Broken in 7.4b |