From: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. |
Date: | 2019-09-27 16:43:08 |
Message-ID: | f5069d7e-91e6-635b-5bfe-dce4e18714e2@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
25.09.2019 22:14, Peter Geoghegan wrote:
>
>>> We still haven't added an "off" switch to deduplication, which seems
>>> necessary. I suppose that this should look like GIN's "fastupdate"
>>> storage parameter.
>> Why is it necessary to save this information somewhere but rel->rd_options,
>> while we can easily access this field from _bt_findinsertloc() and
>> _bt_load().
> Maybe, but we also need to access a flag that says it's safe to use
> deduplication. Obviously deduplication is not safe for datatypes like
> numeric and text with a nondeterministic collation. The "is
> deduplication safe for this index?" mechanism will probably work by
> doing several catalog lookups. This doesn't seem like something we
> want to do very often, especially with a buffer lock held -- ideally
> it will be somewhere that's convenient to access.
>
> Do we want to do that separately, and have a storage parameter that
> says "I would like to use deduplication in principle, if it's safe"?
> Or, do we store both pieces of information together, and forbid
> setting the storage parameter to on when it's known to be unsafe for
> the underlying opclasses used by the index? I don't know.
>
> I think that you can start working on this without knowing exactly how
> we'll do those catalog lookups. What you come up with has to work with
> that before the patch can be committed, though.
>
Attached is v19.
* It adds new btree reloption "deduplication".
I decided to refactor the code and move BtreeOptions into a separate
structure,
rather than adding new btree specific value to StdRelOptions.
Now it can be set even for indexes that do not support deduplication.
In that case it will be ignored. Should we add this check to option
validation?
* By default deduplication is on for non-unique indexes and off for
unique ones.
* New function _bt_dedup_is_possible() is intended to be a single place
to perform all the checks. Now it's just a stub to ensure that it works.
Is there a way to extract this from existing opclass information,
or we need to add new opclass field? Have you already started this work?
I recall there was another thread, but didn't manage to find it.
* I also integrated into this version your latest patch that enables
deduplication on unique indexes,
since now it can be easily switched on/off.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
v19-0001-Add-deduplication-to-nbtree.patch | text/x-patch | 152.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexey Bashtanov | 2019-09-27 16:45:23 | Re: log bind parameter values on error |
Previous Message | Amit Khandekar | 2019-09-27 16:41:00 | Re: Minimal logical decoding on standbys |