From: | Олег Самойлов <splarv(at)ya(dot)ru> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Ravi Krishna <srkrishna1(at)aol(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: GIN Index for low cardinality |
Date: | 2018-11-14 08:45:49 |
Message-ID: | B3532C91-9530-4D54-B9D7-41752E279AC6@ya.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Eh, I checked, you are right. Something terribly wrong with hash index in PostgreSQL. But there are another hash index gin(jsonb_path_ops), may be correctly say gin+hash index. Looked like it is the best for this purpose.
> 26 окт. 2018 г., в 19:27, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> написал(а):
>
> On Thu, Oct 25, 2018 at 9:36 AM Олег Самойлов <splarv(at)ya(dot)ru <mailto:splarv(at)ya(dot)ru>> wrote:
>
>> 17 окт. 2018 г., в 13:46, Ravi Krishna <srkrishna1(at)aol(dot)com <mailto:srkrishna1(at)aol(dot)com>> написал(а):
>>
>> In https://www.cybertec-postgresql.com/en/ideas-for-scaling-postgresql-to-multi-terabyte-and-beyond/ <https://www.cybertec-postgresql.com/en/ideas-for-scaling-postgresql-to-multi-terabyte-and-beyond/>
>>
>> it is mentioned:
>>
>> "GIN, the most know non-default index type perhaps, has been actually around for ages (full-text search) and in short is perfect for indexing columns where there are lot of repeating values – think all kinds of statuses or good old Mr/Mrs/Miss. GIN only stores every unique column value only once as for the default B-tree you’ll have e.g. 1 millon leaf nodes with the integer “1” in it."
>>
>>
>> Does it mean that GIN is a very good choice for low cardinality columns.
>
> Not necessary. There is other index which also don’t keep column value in an every leaf. Hash, for instance.
>
> For smallish values (which low cardinality columns tend to be) the per-tuple overhead and the pointer itself is probably much larger than the value, so hash won't save you much if any space. The GIN index removes not just the value, but the per-tuple overhead. And also compresses the point list to further save space.
>
> Here is a real-world example from one of my databases where each value is about 17 characters long, and is present about 20 times:
>
> gin: 411 MB
> btree: 2167 MB
> hash: 2159 MB
>
> Cheers,
>
> Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Ravi Krishna | 2018-11-14 10:33:09 | Re: Impact on PostgreSQL due to Redhat acquisition by IBM |
Previous Message | Chris Travers | 2018-11-14 07:26:51 | Re: Impact on PostgreSQL due to Redhat acquisition by IBM |