From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com> |
Subject: | Re: Yet another fast GiST build |
Date: | 2021-04-07 13:18:53 |
Message-ID: | 7386285b-0e2f-e89e-81f4-f63775becb2e@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/04/2021 15:12, Andrey Borodin wrote:
>> 7 апр. 2021 г., в 14:56, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
>> написал(а):
>>
>> Ok, I think I understand that now. In btree_gist, the *_cmp()
>> function operates on non-leaf values, and *_lt(), *_gt() et al
>> operate on leaf values. For all other datatypes, the leaf and
>> non-leaf representation is the same, but for bit/varbit, the
>> non-leaf representation is different. The leaf representation is
>> VarBit, and non-leaf is just the bits without the 'bit_len' field.
>> That's why it is indeed correct for gbt_bitcmp() to just use
>> byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field
>> separately. That's subtle, and 100% uncommented.
>>
>> What that means for this patch is that gbt_bit_sort_build_cmp()
>> should *not* call byteacmp(), but bitcmp(). Because it operates on
>> the original datatype stored in the table.
>
> +1 Thanks for investigating this. If I understand things right,
> adding test values with different lengths of bit sequences would not
> uncover the problem anyway?
That's right, the only consequence of a "wrong" sort order is that the
quality of the tree suffers, and scans need to scan more pages
unnecessarily.
I tried to investigate this by creating a varbit index with and without
sorting, and compared them with pageinspect, but in quick testing, I
wasn't able to find cases where the sorted version was badly ordered. I
guess I didn't find the right data set yet.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2021-04-07 13:21:49 | Re: CREATE SEQUENCE with RESTART option |
Previous Message | Julien Rouhaud | 2021-04-07 12:57:26 | Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view? |