From: | Elliott Bennett <pgmail(at)dhtns(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | select distinct on varchar -- wild performance differences! |
Date: | 2005-06-23 03:19:53 |
Message-ID: | 42BA2A59.60504@dhtns.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hey, all. I've bounced this around in #postgres for an hour or so, and
it was suggested that I post it here as well. Hopefully someone can
help me out.
I have three machines. All have 512MB of ram.
Machine A is a 2.0ghz celeron, running debian, pg verison 7.4.6.
Machine B is a 1.8ghz celeron, running centos 3.4, pg verison 8.0.3.
(7.3.9 also exhibited the behaviour below, by the way)
Machine C is a 1.0ghz athlon, running centos 4.0, pg verison 7.4.7.
The SAME data and schema is loaded (from a pg_dump, default parameters)
onto all three machines. With the same query: "select distinct model
from exif_common", machines A and C return results quickly (1/4
second). Machine B chews on it for 30ish seconds! Note, this column is
a VARCHAR(40).
Here's an explain analyze for it.
Machine A (fast):
photos=# explain analyze select distinct model from exif_common;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2629.74..2732.11 rows=5 width=15) (actual time=211.358..265.049 rows=6 loops=1)
-> Sort (cost=2629.74..2680.93 rows=20473 width=15) (actual time=211.351..242.296 rows=20473 loops=1)
Sort Key: model
-> Seq Scan on exif_common (cost=0.00..1163.73 rows=20473 width=15) (actual time=0.022..58.635 rows=20473 loops=1)
Total runtime: 265.928 ms
(5 rows)
Machine B (slow):
photos=# explain analyze select distinct model from exif_common;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2640.74..2743.11 rows=6 width=15) (actual time=27939.231..32914.134 rows=6 loops=1)
-> Sort (cost=2640.74..2691.93 rows=20473 width=15) (actual time=27939.222..27983.784 rows=20473 loops=1)
Sort Key: model
-> Seq Scan on exif_common (cost=0.00..1174.73 rows=20473 width=15) (actual time=0.071..97.772 rows=20473 loops=1)
Total runtime: 32915.031 ms
(5 rows)
( yes, i know, six distinct rows out of 20,000.... But holy moly! 1/4
sec vs 32.9 sec?!?! )
Now, if I do a similar query against an INT column, the speeds are more
in line with each other:
Machine A:
photos=# explain analyze select distinct imagewidth from exif_common;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Unique (cost=2629.74..2732.11 rows=36 width=4) (actual time=179.899..225.934 rows=107 loops=1)
-> Sort (cost=2629.74..2680.93 rows=20473 width=4) (actual time=179.891..207.632 rows=20473 loops=1)
Sort Key: imagewidth
-> Seq Scan on exif_common (cost=0.00..1163.73 rows=20473 width=4) (actual time=0.024..62.946 rows=20473 loops=1)
Total runtime: 226.707 ms
(5 rows)
Machine B:
photos=# explain analyze select distinct imagewidth from exif_common;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Unique (cost=2640.74..2743.11 rows=24 width=4) (actual time=209.394..287.131 rows=107 loops=1)
-> Sort (cost=2640.74..2691.93 rows=20473 width=4) (actual time=209.384..251.693 rows=20473 loops=1)
Sort Key: imagewidth
-> Seq Scan on exif_common (cost=0.00..1174.73 rows=20473 width=4) (actual time=0.074..94.574 rows=20473 loops=1)
Total runtime: 288.411 ms
(5 rows)
Machine C exhibits the same behaviour as A for all queries.
This weird slow behaviour on machine B also appeared in 7.3.9.
Upgrading didn't seem to help.
neilc from irc thought it may be a qsort(2) quirk, but a sample C
program I whipped up testing different sized data sets with a similar
distribution gave very similar sort timings between the three
machines.. Therefore, I don't think it's qsort(2) to blame...
Anyone have any ideas as to what may be up with machine B?
Thanks,
-Elliott
From | Date | Subject | |
---|---|---|---|
Next Message | Kurt De Grave | 2005-06-23 08:33:18 | Re: parameterized LIKE does not use index |
Previous Message | Keith Worthington | 2005-06-23 02:31:29 | Re: Performance Tuning Article |