From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> |
Cc: | "Stephan Szabo" <sszabo(at)megazone23(dot)bigpanda(dot)com>, "MindTerm" <mindterm(at)yahoo(dot)com>, pgsql-sql(at)postgresql(dot)org |
Subject: | Re: performance tuning in large function / transaction |
Date: | 2001-12-17 06:21:16 |
Message-ID: | 29547.1008570076@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
"Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> writes:
> Is it true that the IN command is implemented sort of as a linked list
> linear time search? Is there any plan for a super-fast implementation of
> 'IN'?
This deserves a somewhat long-winded answer.
Postgres presently supports two kinds of IN (I'm not sure whether SQL92
allows any additional kinds):
1. Scalar-list IN: foo IN ('bar', 'baz', 'quux', ...)
2. Sub-select IN: foo IN (SELECT bar FROM ...)
In the scalar-list form, a variable is compared to an explicit list of
constants or expressions. This form is exactly equivalent to
foo = 'bar' OR foo = 'baz' OR foo = 'quux' OR ...
and is converted into that form by the parser. The planner is capable
of converting a WHERE clause of this kind into multiple passes of
indexscan, when foo is an indexed column and all the IN-list elements
are constants. Whether it actually will make that conversion depends
on the usual vagaries of pg_statistic entries, etc. But if it's a
unique or fairly-selective index, and there aren't a huge number of
entries in the IN list, a multiple indexscan should be a good plan.
In the sub-select form, we pretty much suck: for each tuple in the outer
query, we run the inner query until we find a matching value or the
inner query ends. This is basically a nested-loop scenario, with the
only (minimally) redeeming social value being that the planner realizes
it should pick a fast-start plan for the inner query. I think it should
be possible to convert this form into a modified kind of join (sort of
the reverse of an outer join: rather than at least one result per
lefthand row, at most one result per lefthand row), and then we could
use join methods that are more efficient than nested-loop. But no one's
tried to make that happen yet.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Christopher Kings-Lynne | 2001-12-17 06:33:40 | 'IN' performance |
Previous Message | Bruce Momjian | 2001-12-17 05:39:27 | Re: performance tuning in large function / transaction |