From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Francois Deliege <fdeliege(at)gmail(dot)com> |
Subject: | [PATCH] Lazy hashaggregate when no aggregation is needed |
Date: | 2012-03-28 02:37:25 |
Message-ID: | CA+CSw_uE-RCyQd_bXJNe=usrXkq+keFrQrahkc+8ou+Ws4Y=Vw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
A user complained on pgsql-performance that SELECT col FROM table
GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
to return tuples from hash-aggregate as they are found when no
aggregate functions are in use. Attached is a first shot at that. The
planner is modified so that when the optimization applies, hash table
size check is compared against the limit and start up cost comes from
the input. The executor is modified so that when the hash table is not
filled yet and the optimization applies, nodes are returned
immediately.
Can somebody poke holes in this? The patch definitely needs some code
cleanup in nodeAgg.c, but otherwise it passes regression tests and
seems to work as intended. It also optimizes the SELECT DISTINCT col
FROM table LIMIT 2; case, but not SELECT DISTINCT ON (col) col FROM
table LIMIT 2 because it is explicitly forced to use sorted
aggregation.
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
Attachment | Content-Type | Size |
---|---|---|
lazy-hashaggregate.patch | text/x-patch | 8.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jaime Casanova | 2012-03-28 06:21:26 | triggers and inheritance tree |
Previous Message | Fujii Masao | 2012-03-28 02:10:46 | Re: [COMMITTERS] pgsql: pg_test_timing utility, to measure clock monotonicity and timing |