From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | Patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: tsearch refactorings |
Date: | 2007-09-05 17:13:45 |
Message-ID: | 46DEE3C9.8060805@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Heikki Linnakangas wrote:
> Teodor Sigaev wrote:
>>> Ok. Probably easiest to do that by changing the palloc to palloc0 in
>>> parse_tsquery.
>> and change sizeof to sizeof(QueryItem)
>
> Do you mean the sizeofs in the memcpys in parse_tsquery? You can't
Oops, I meant pallocs in push* function. palloc0 in parse_tsquery is another way.
>
> BTW, can you explain what the CRC-32 of a value is used for? It looks
> like it's used to speed up some operations, by comparing the CRCs before
> comparing the values, but I didn't quite figure out how it works. How
It's mostly used in GiST indexes - recalculating crc32 every time for each index
tuple to be checked is rather expensive.
> much of a performance difference does it make? Would hash_any do a
> better/cheaper job?
crc32 was chosen after testing a lot of hash function. Perl's hash was the
fastest, but crc32 makes much less number of collisions. That's interesting for
ASCII a lot of functions produce rather small number of collision, but for upper
part of table (0x7f-0xff) crc32 was the best. CRC32 has evenly distributed
collisions over characters, others - not.
> In any case, I think we need to calculate the CRC/hash in tsqueryrecv,
> instead of trusting the client.
Agreed.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-09-05 17:25:46 | Re: HOT patch - version 15 |
Previous Message | Kris Jurka | 2007-09-05 17:05:23 | Re: GSS warnings |