From: | keisuke kuroda <keisuke(dot)kuroda(dot)3862(at)gmail(dot)com> |
---|---|
To: | "Moon, Insung" <tsukiwamoon(dot)pgsql(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Wrong value in metapage of GIN INDEX. |
Date: | 2019-08-30 02:48:47 |
Message-ID: | CANDwggLfKsT28YhcoR6k3N-k1Aw887vT5-1cUXZ0XKPBvHw4Rw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Moon-san.
Thank you for posting.
We are testing the GIN index onJSONB type.
The default maintenance_work_mem (64MB) was fine in usually cases.
However, this problem occurs when indexing very large JSONB data.
best regards,
Keisuke Kuroda
2019年8月29日(木) 17:20 Moon, Insung <tsukiwamoon(dot)pgsql(at)gmail(dot)com>:
> Dear Hackers.
>
> Kuroda-san and I are interested in the GIN index and have been testing
> various things.
> While testing, we are found a little bug.
> Some cases, the value of nEntries in the metapage was set to the wrong
> value.
>
> This is a reproduce of bug situation.
> =# SET maintenance_work_mem TO '1MB';
> =# CREATE TABLE foo(i jsonb);
> =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
> generate_series(1, 10000) AS i;
>
> # Input the same value again.
> =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM
> generate_series(1, 10000) AS i;
> # Creates GIN Index.
> =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops);
>
>
> =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH
> (fastupdate=off);
> -[ RECORD 1 ]----+-----------
> pending_head | 4294967295
> pending_tail | 4294967295
> tail_free_size | 0
> n_pending_pages | 0
> n_pending_tuples | 0
> n_total_pages | 74
> n_entry_pages | 69
> n_data_pages | 4
> n_entries | 20004 <--★
> version | 2
>
> In this example, the nentries value should be 10001 because the gin
> index stores duplicate values in one leaf(posting tree or posting
> list).
> But, if look at the nentries value of metapage using pageinspect, it
> is stored as 20004.
> So, Let's run the vacuum.
>
>
> =# VACUUM foo;
> =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0));
> -[ RECORD 1 ]----+-----------
> pending_head | 4294967295
> pending_tail | 4294967295
> tail_free_size | 0
> n_pending_pages | 0
> n_pending_tuples | 0
> n_total_pages | 74
> n_entry_pages | 69
> n_data_pages | 4
> n_entries | 10001 <--★
> version | 2
>
> Ah. Run to the vacuum, nEntries is changing the normal value.
>
> There is a problem with the ginEntryInsert function. That calls the
> table scan when creating the gin index, ginBuildCallback function
> stores the new heap value inside buildstate struct.
> And next step, If GinBuildState struct is the size of the memory to be
> using is equal to or larger than the maintenance_work_mem value, run
> to input value into the GIN index.
> This process is a function called ginEnctryInsert.
> The ginEntryInsert function called at this time determines that a new
> entry is added and increase the value of nEntries.
> However currently, ginEntryInsert is first to increase in the value of
> nEntries, and to determine if there are the same entries in the
> current GIN index.
> That causes the bug.
>
> The patch is very simple.
> Fix to increase the value of nEntries only when a non-duplicate GIN
> index leaf added.
>
> This bug detection and code fix worked with Kuroda-san.
>
> Best Regards.
> Moon.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2019-08-30 03:15:37 | Re: A problem about partitionwise join |
Previous Message | Michael Paquier | 2019-08-30 00:10:16 | Re: REINDEX filtering in the backend |