From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Andy Colson <andy(at)squeakycode(dot)net> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: texteq/byteaeq: avoid detoast [REVIEW] |
Date: | 2011-01-16 21:07:13 |
Message-ID: | AANLkTimBy4QCXodj7XHqQhEqCqWZhme=WirzNa1vAEbF@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello
I looked on this patch too.
It's good idea.
I think, so we can have a function or macro that compare a varlena
sizes. Some like
Datum texteq(..)
{
if (!datumsHasSameLength(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1))
PG_RETURN_FALSE();
... actual code ..
}
Regards
Pavel Stehule
2011/1/16 Andy Colson <andy(at)squeakycode(dot)net>:
> This is a review of:
> https://commitfest.postgresql.org/action/patch_view?id=468
>
> Purpose:
> ========
> Equal and not-equal _may_ be quickly determined if their lengths are
> different. This _may_ be a huge speed up if we dont have to detoat.
>
>
> The Patch:
> ==========
> I was able to read and understand the patch, its a simple change and looked
> correct to me (a non PG hacker).
> It applies clean to git head, compiles and runs fine with debug enabled.
>
> make check passes
>
>
> Usability:
> ==========
> I used _may_ above. The benchmark included with the patch, showing huge
> speedups, is really contrived. It uses a where clause with a thousand
> character constant: (where c =
> 'long...long...long...long...ConstantText...etc'). In my opinion this is
> very uncommon (the author does note this is a "best case"). If you have a
> field large enough to be toasted you are not going to be using that to
> search on, you are going to have an ID field that is indexed. (select c
> where id = 7)
>
> This also only touches = and <>. > < and like wont be touched. So I think
> the scope of this is limited.
>
> THAT being said, the patch is simple, and if you do happen to hit the code,
> it will speed things up. As a user of PG I'd like to have this included.
> Its a corner case, but a big corner, and its a small, simple change, and it
> wont slow anything else down.
>
>
> Performance:
> ============
> I created myself a more real world test, with a table with indexes and id's
> and a large toasted field.
>
> create table junk(id serial primary key, xgroup integer, c text);
> create index junk_group on junk(xgroup);
>
>
> I filled it full of junk:
>
> do $$
> declare i integer;
> declare j integer;
> begin
> for i in 1..100 loop
> for j in 1..500 loop
> insert into junk(xgroup, c) values (j, 'c'||i);
> insert into junk (xgroup, c) select j, repeat('abc',
> 2000)|| n from generate_series(1, 5) n;
> end loop;
> end loop;
> end$$;
>
>
> This will make about 600 records within the same xgroup. As well as a
> simple 'c15' type of value in c we can search for. My thinking is you may
> not know the exact unique id, but you do know what group its in, so that'll
> cut out 90% of the records, and then you'll have to add " and c = 'c15'" to
> get the exact one you want.
>
> I still saw a nice performance boost.
>
> Old PG:
> $ psql < bench3.sql
> Timing is on.
> DO
> Time: 2010.412 ms
>
> Patched:
> $ psql < bench3.sql
> Timing is on.
> DO
> Time: 184.602 ms
>
>
> bench3.sql:
> do $$
> declare i integer;
> begin
> for i in 1..400 loop
> perform count(*) from junk where xgroup = i and c like 'c' ||
> i;
> end loop;
> end$$;
>
>
>
> Summary:
> ========
> Performance speed-up: Oh yeah! If you just happen to hit it, and if you do
> hit it, you might want to re-think your layout a little bit.
>
> Do I want it? Yes please.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira de Oliveira | 2011-01-16 21:13:30 | Re: [HACKERS] reviewers needed! |
Previous Message | Dimitri Fontaine | 2011-01-16 21:06:29 | Re: LOCK for non-tables |