7.3 support for IN

From: "Gregory Wood" <gregw(at)com-stock(dot)com>
To: "PostgreSQL-General" <pgsql-general(at)postgresql(dot)org>
Subject: 7.3 support for IN
Date: 2002-07-18 17:52:44
Message-ID: 009601c22e85$7e9335a0$7889ffcc@comstock.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Before anyone starts thinking it, yes, I am aware that INs are really slow
in PostgreSQL. That being said... I was doing some tests with big IN
statements and various workarounds and was shocked at some of the results.

Here are the test cases I came up with:

1. Single value (16 ms)
2. 2 values using IN (870 ms)
3. 2 values using OR (870 ms)
4. 2 values using EXISTS (840 ms)
5. 2 values using JOIN (760 ms)
6. 2 values using UNION (31 ms)
7. 44 values using IN (960 ms)
8. 44 values using OR (960 ms)
9. 44 values using EXISTS (880 ms)
10. 44 values using JOIN (930 ms)
11. 44 values using UNION (670 ms)

Note that the EXISTS and the JOIN do not count the time needed to execute
the CREATE TABLE or COPY statements needed to populate a separate dataset to
query across.

Now then, it concerns me that doing 44 separate SELECTs and UNIONing the
results together is faster than simply doing an IN or an OR. Heck, it's
significantly faster to do 44 UNIONs than it is to do (Field=Value1 OR
Field=Value2) with just *2* values. Is that really to be expected or am I
doing a really poor job of optimizing/querying? Is this going to be improved
in 7.3?

I've attached the schema and the queries used to do my very rough
benchmarking. The two tables have 591655 and 3044478 rows. If you would like
to see anything else, please let me know.

Greg

Attachment Content-Type Size
benchmark_schema.sql application/octet-stream 6.8 KB
city_benchmark.sql application/octet-stream 18.7 KB

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christian Lbeck 2002-07-18 17:53:51 References for PostgreSQL
Previous Message Marc G. Fournier 2002-07-18 17:44:50 Re: Bad HTML in Mail List Archives