From: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Use quick select instead of qsort to get median |
Date: | 2021-07-22 12:07:06 |
Message-ID: | OS0PR01MB5716A0A4B9CFC4D8A662C64994E49@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
When I was writing an extension which need to get the median of an array, I
tried to find if postgres provide some api that can do that. I found all the
places in postgres invoke qsort() and then get the median. I was thinking can
we do better by using "quick select" and is it worth it.
Currently, there are some places[1] in the code that need the median and can
use "quick select" instead. And some of them(spg_box_quad_picksplit /
spg_range_quad_picksplit) are invoked frequently when INSERT or CREATE INDEX.
So, Peronally, It's acceptable to introduce a quick select api to improve these
places.
Since most of the logic of "quick select" is similar to quick sort, I think
we can reuse the code in sort_template.h. We only need to let the sort stop
when find the target top Kth element.
Attach a POC patch about this idea. I did some simple performance tests, I can
see about 10% performance gain in this test[2].
Thoughts ?
[1]
1.
entry_dealloc
...
/* Record the (approximate) median usage */
if (i > 0)
pgss->cur_median_usage = entries[i / 2]->counters.usage;
2.
spg_box_quad_picksplit
qsort(lowXs, in->nTuples, sizeof(float8), compareDoubles);
...
centroid->low.x = lowXs[median];
3.
spg_range_quad_picksplit
qsort(lowerBounds, nonEmptyCount, sizeof(RangeBound),
...
centroid = range_serialize(typcache, &lowerBounds[median],
4.
spg_quad_picksplit
qsort(sorted, in->nTuples, sizeof(*sorted), x_cmp);
...
centroid->x = sorted[median]->x;
[2]
drop table quad_box_tbl;
CREATE unlogged TABLE quad_box_tbl (id int, b box);
truncate quad_box_tbl ;
explain (verbose, analyze)INSERT INTO quad_box_tbl
SELECT (x - 1) * 10 + x, box(point(x * 10, x * 20), point(x * 10, x * 20 + 5))
FROM generate_series(1, 1000000) x order by random();
-----test create index
drop index quad_box_tbl_idx;
CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);
-------test results
PATCH:
Time: 2609.664 ms (00:02.610)
HEAD:
Time: 2903.765 ms (00:02.944)
Best regards,
Houzj
Attachment | Content-Type | Size |
---|---|---|
0001-test-use-quick-select-to-get-median.patch | application/octet-stream | 10.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2021-07-22 12:11:24 | Re: window build doesn't apply PG_CPPFLAGS correctly |
Previous Message | Andrew Dunstan | 2021-07-22 12:04:34 | Re: window build doesn't apply PG_CPPFLAGS correctly |