From: | "Moon, Insung" <tsukiwamoon(dot)pgsql(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | keisuke(dot)kuroda(dot)3862(at)gmail(dot)com |
Subject: | Wrong value in metapage of GIN INDEX. |
Date: | 2019-08-29 08:19:51 |
Message-ID: | CAEMmqBuH_O-oXL+3_ArQ6F5cJ7kXVow2SGQB3HRacku_T+xkmA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Hackers.
Kuroda-san and I are interested in the GIN index and have been testing
various things.
While testing, we are found a little bug.
Some cases, the value of nEntries in the metapage was set to the wrong value.
This is a reproduce of bug situation.
=# SET maintenance_work_mem TO '1MB';
=# CREATE TABLE foo(i jsonb);
=# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
generate_series(1, 10000) AS i;
# Input the same value again.
=# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
generate_series(1, 10000) AS i;
# Creates GIN Index.
=# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops);
=# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH
(fastupdate=off);
-[ RECORD 1 ]----+-----------
pending_head | 4294967295
pending_tail | 4294967295
tail_free_size | 0
n_pending_pages | 0
n_pending_tuples | 0
n_total_pages | 74
n_entry_pages | 69
n_data_pages | 4
n_entries | 20004 <--★
version | 2
In this example, the nentries value should be 10001 because the gin
index stores duplicate values in one leaf(posting tree or posting
list).
But, if look at the nentries value of metapage using pageinspect, it
is stored as 20004.
So, Let's run the vacuum.
=# VACUUM foo;
=# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0));
-[ RECORD 1 ]----+-----------
pending_head | 4294967295
pending_tail | 4294967295
tail_free_size | 0
n_pending_pages | 0
n_pending_tuples | 0
n_total_pages | 74
n_entry_pages | 69
n_data_pages | 4
n_entries | 10001 <--★
version | 2
Ah. Run to the vacuum, nEntries is changing the normal value.
There is a problem with the ginEntryInsert function. That calls the
table scan when creating the gin index, ginBuildCallback function
stores the new heap value inside buildstate struct.
And next step, If GinBuildState struct is the size of the memory to be
using is equal to or larger than the maintenance_work_mem value, run
to input value into the GIN index.
This process is a function called ginEnctryInsert.
The ginEntryInsert function called at this time determines that a new
entry is added and increase the value of nEntries.
However currently, ginEntryInsert is first to increase in the value of
nEntries, and to determine if there are the same entries in the
current GIN index.
That causes the bug.
The patch is very simple.
Fix to increase the value of nEntries only when a non-duplicate GIN
index leaf added.
This bug detection and code fix worked with Kuroda-san.
Best Regards.
Moon.
Attachment | Content-Type | Size |
---|---|---|
GIN_Metapage_bugfix.patch | application/octet-stream | 897 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2019-08-29 08:52:55 | Re: REINDEX filtering in the backend |
Previous Message | Masahiko Sawada | 2019-08-29 07:36:37 | Re: Resume vacuum and autovacuum from interruption and cancellation |