Re: pg_trgm Memory Allocation logic

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_trgm Memory Allocation logic
Date: 2015-03-09 13:28:39
Message-ID: 54FDA007.1060303@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/09/2015 02:54 PM, Alvaro Herrera wrote:
> Beena Emerson wrote:
>> In the pg_trgm module, within function generate_trgm, the memory for trigrams
>> is allocated as follows:
>>
>> trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);
>>
>> I have been trying to understand why this is so because it seems to be
>> allocating more space than that is required.
>
> Maybe it's considering a worst-case for multibyte characteres? I don't
> really know if trgm supports multibyte, but I assume it does. If it
> does, then probably the trigrams consist of chars, not bytes.

Nope. Trigrams are always three bytes, even ones containing multibyte
characters. If there are any multibyte characters in the trigram, we
store a 3-byte checksum of the three characters instead. That loses some
information, you can have a collision where one multibyte trigram
incorrectly matches another one, but the trigram algorithms are
generally not too concerned about exact results.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-03-09 13:33:44 Re: pg_trgm Memory Allocation logic
Previous Message Alvaro Herrera 2015-03-09 13:03:59 Re: Object files generated by ecpg test suite not ignored on Windows