From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: BUG #17245: Index corruption involving deduplicated entries |
Date: | 2021-11-02 04:24:23 |
Message-ID: | CAD21AoD3b4ahgQ3y-WomokzZe9UPXS8Xtvur6P2aQA6X3B1TeQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Tue, Nov 2, 2021 at 9:14 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Sun, Oct 31, 2021 at 7:21 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Attached patch adds assertions and comments to
> > heap_page_prune_execute() that document my understanding of things.
> > This passes "make check-world" for me.
>
> Attached is a more worked out fix for the bug. It includes a test case
> that will lead to an assertion failure in nbtdedup.c (when the fix
> itself is absent). The test is loosely based on one from Masahiko's
> patch on the dedicated parallel VACUUM thread.
Thank you for updating the patch. For parallel vacuum fix, the fix
looks good to me.
Regarding regression tests:
+ * XXX: Actually, there might be no statistics at all -- index AMs
+ * sometimes return NULL from amvacuumcleanup. That makes it hard to
+ * assert that we really called amvacuumcleanup in the leader from
+ * here.
that's true. We cannot know whether indexes were actually vacuumed by
parallel workers by checking the shared index stats. While thinking
it's a good idea to use on an assertion in nbtdedup.c I'm concerned a
bit about relying on another module to detect a vacuum bug. If we were
to need to change the code of nbtdedup.c in the future, the premise
that the bug in parallel vacuum leads to the assertion failure might
break. It might not happen in back branches, though.
The problem with this bug is that leader process misses to vacuum
indexes that must be processed by the leader. So, another idea (but
not a comprehensive approach) to detect this issue would be that we
check if the leader processed all indexes that must be processed by
leader at the end of do_serial_processing_for_unsafe_indexes(). That
is, we check if all of both indexes whose entries in
will_parallel_vacuum[] are false and unsafe indexes have been
processed by leader.
> I've also combined my heap pruning assertion patch (from the other
> thread) with the heap_index_delete_tuples() assertions we talked about
> on this thread -- they're very similar, and so might as well just be
> handled as a single commit. I can delay committing this second patch
> until I hear back about the details. Note that there still aren't any
> new defensive ERRORs here, for now.
I'll look at the second patch.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | arjun shetty | 2021-11-02 10:23:38 | Re: BUG #17241: llvm::install_bad_alloc_error_handler error |
Previous Message | egashira.yusuke@fujitsu.com | 2021-11-02 02:56:57 | RE: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB. |