From: | Peter Geoghegan <peter(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: sortsupport for text |
Date: | 2012-06-16 21:50:27 |
Message-ID: | CAEYLb_Wud3m49qKh_4Ki+7AAkabYqmVTM0Zw18bM9jZkT-nprg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18 March 2012 15:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here. To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea. Then compare the performance of
> ORDER BY text_col to ORDER BY strxfrm(text_col).
>
> (You would need to have either both or neither of text and bytea
> using the sortsupport code paths for this to be a fair comparison.)
I thought this was an interesting idea, so decided to try it out for
myself. I tried this out against master (not Robert's patch, per Tom's
direction). The results were interesting:
[peter(at)peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_strxfrm.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2795
tps = 46.563970 (including connections establishing)
tps = 46.568234 (excluding connections establishing)
[peter(at)peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_reg.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2079
tps = 34.638838 (including connections establishing)
tps = 34.640665 (excluding connections establishing)
The first test executed the following query against the dellstore database:
select * from products order by strxfrm_test(actor) offset 10001;
The second:
select * from products order by actor offset 10001;
So, this was pretty good - an improvement that is completely
independent of Robert's. Bear in mind, this simple demonstration adds
additional fmgr overhead, which we have plenty of reason to believe
could hurt things, besides which each call must allocate memory that
could perhaps be avoided. In addition, I don't know enough about
locale-aware sorting and related algorithms to have devised a test
that would stress strxfrm()/ strcoll() - these were all strings that
could be represented as ASCII.
In light of this, I think there is a pretty strong case to be made for
pre-processing text via strxfrm() as part of this patch.
Thoughts?
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
Attachment | Content-Type | Size |
---|---|---|
strxfrm_test.tar.gz | application/x-gzip | 2.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2012-06-16 22:25:15 | Re: measuring spinning |
Previous Message | Tom Lane | 2012-06-16 21:38:18 | s/UNSPECIFIED/SIMPLE/ in foreign key code? |