Re: performance tuning in large function / transaction

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: "Stephan Szabo" <sszabo(at)megazone23(dot)bigpanda(dot)com>, "MindTerm" <mindterm(at)yahoo(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: performance tuning in large function / transaction
Date: 2001-12-17 06:21:16
Message-ID: 29547.1008570076@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

"Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> writes:
> Is it true that the IN command is implemented sort of as a linked list
> linear time search? Is there any plan for a super-fast implementation of
> 'IN'?

This deserves a somewhat long-winded answer.

Postgres presently supports two kinds of IN (I'm not sure whether SQL92
allows any additional kinds):

1. Scalar-list IN: foo IN ('bar', 'baz', 'quux', ...)

2. Sub-select IN: foo IN (SELECT bar FROM ...)

In the scalar-list form, a variable is compared to an explicit list of
constants or expressions. This form is exactly equivalent to
foo = 'bar' OR foo = 'baz' OR foo = 'quux' OR ...
and is converted into that form by the parser. The planner is capable
of converting a WHERE clause of this kind into multiple passes of
indexscan, when foo is an indexed column and all the IN-list elements
are constants. Whether it actually will make that conversion depends
on the usual vagaries of pg_statistic entries, etc. But if it's a
unique or fairly-selective index, and there aren't a huge number of
entries in the IN list, a multiple indexscan should be a good plan.

In the sub-select form, we pretty much suck: for each tuple in the outer
query, we run the inner query until we find a matching value or the
inner query ends. This is basically a nested-loop scenario, with the
only (minimally) redeeming social value being that the planner realizes
it should pick a fast-start plan for the inner query. I think it should
be possible to convert this form into a modified kind of join (sort of
the reverse of an outer join: rather than at least one result per
lefthand row, at most one result per lefthand row), and then we could
use join methods that are more efficient than nested-loop. But no one's
tried to make that happen yet.

regards, tom lane

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Christopher Kings-Lynne 2001-12-17 06:33:40 'IN' performance
Previous Message Bruce Momjian 2001-12-17 05:39:27 Re: performance tuning in large function / transaction