From: | Andy Colson <andy(at)squeakycode(dot)net> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | texteq/byteaeq: avoid detoast [REVIEW] |
Date: | 2011-01-16 19:05:11 |
Message-ID: | 4D334167.7060108@squeakycode.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
This is a review of:
https://commitfest.postgresql.org/action/patch_view?id=468
Purpose:
========
Equal and not-equal _may_ be quickly determined if their lengths are different. This _may_ be a huge speed up if we dont have to detoat.
The Patch:
==========
I was able to read and understand the patch, its a simple change and looked correct to me (a non PG hacker).
It applies clean to git head, compiles and runs fine with debug enabled.
make check passes
Usability:
==========
I used _may_ above. The benchmark included with the patch, showing huge speedups, is really contrived. It uses a where clause with a thousand character constant: (where c = 'long...long...long...long...ConstantText...etc'). In my opinion this is very uncommon (the author does note this is a "best case"). If you have a field large enough to be toasted you are not going to be using that to search on, you are going to have an ID field that is indexed. (select c where id = 7)
This also only touches = and <>. > < and like wont be touched. So I think the scope of this is limited.
THAT being said, the patch is simple, and if you do happen to hit the code, it will speed things up. As a user of PG I'd like to have this included. Its a corner case, but a big corner, and its a small, simple change, and it wont slow anything else down.
Performance:
============
I created myself a more real world test, with a table with indexes and id's and a large toasted field.
create table junk(id serial primary key, xgroup integer, c text);
create index junk_group on junk(xgroup);
I filled it full of junk:
do $$
declare i integer;
declare j integer;
begin
for i in 1..100 loop
for j in 1..500 loop
insert into junk(xgroup, c) values (j, 'c'||i);
insert into junk (xgroup, c) select j, repeat('abc', 2000)|| n from generate_series(1, 5) n;
end loop;
end loop;
end$$;
This will make about 600 records within the same xgroup. As well as a simple 'c15' type of value in c we can search for. My thinking is you may not know the exact unique id, but you do know what group its in, so that'll cut out 90% of the records, and then you'll have to add " and c = 'c15'" to get the exact one you want.
I still saw a nice performance boost.
Old PG:
$ psql < bench3.sql
Timing is on.
DO
Time: 2010.412 ms
Patched:
$ psql < bench3.sql
Timing is on.
DO
Time: 184.602 ms
bench3.sql:
do $$
declare i integer;
begin
for i in 1..400 loop
perform count(*) from junk where xgroup = i and c like 'c' || i;
end loop;
end$$;
Summary:
========
Performance speed-up: Oh yeah! If you just happen to hit it, and if you do hit it, you might want to re-think your layout a little bit.
Do I want it? Yes please.
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2011-01-16 19:13:36 | Re: Include WAL in base backup |
Previous Message | Simone Aiken | 2011-01-16 18:47:11 | Re: ToDo List Item - System Table Index Clustering |