Re: BUG #16833: postgresql 13.1 process crash every hour

From: Alex F <phoedos16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16833: postgresql 13.1 process crash every hour
Date: 2021-05-14 19:11:55
Message-ID: CAGbr_zUVuWp51Q2KOQ3YEm78Z=Q_+XWCQP3i7UMGuCXYYfCr2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Peter,
Honestly don't know if you expect a response with amcheck results but
anyway will paste it here:

DEBUG: verifying that tuples from index
"price_model_product_id_latest_idx" are present in "price_model"
DEBUG: finished verifying presence of 5598051 tuples from table
"price_model" with bitset 48.61% set
DEBUG: verifying consistency of tree structure for index
"name_original_idx_s" with cross-level checks
DEBUG: verifying level 3 (true root level)

DEBUG: verifying level 2
ERROR: down-link lower bound invariant violated for index
"name_original_idx_s"
DETAIL: Parent block=64 child index tid=(868,3) parent page
lsn=1D2F/14483F28.

Anyway, I will wait for v13.4 and try to re-test this crash case.

Thanks for your support!

пт, 14 мая 2021 г. в 20:48, Peter Geoghegan <pg(at)bowt(dot)ie>:

> On Fri, May 14, 2021 at 7:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Hmm, looks like it's time to rope Peter Geoghegan in on this discussion.
>
> I think that this is likely to be a fairly generic symptom of index
> corruption. Ockham's razor does not seem to point to a software bug
> because posting list splits are just not that complicated, and are
> fairly common in the grand scheme of things. Docker is the kind of
> thing that I wouldn't necessarily trust to not do something fishy with
> LVM snapshotting -- I tend to suspect that that is a factor.
>
> There was a very similar bug report and stack trace back in March.
> That case was tied back to generic index corruption using amcheck,
> with indexes corrupted that weren't implicated in the hard crash.
>
> There is a real problem for me to fix here in any case:
> _bt_swap_posting() is unnecessarily trusting of the state of the
> posting list tuple (compared to _bt_split(), say). I still plan on
> adding hardening to _bt_swap_posting() to avoid a hard crash.
> Unfortunately I missed the opportunity to get that into 13.3, but I'll
> get it into 13.4.
>
> Alex should probably run amcheck to see what that throws up. It should
> be possible to run amcheck on your database, which will detect corrupt
> posting list tuples on Postgres 13. It's a contrib extension, so you
> must first run "CREATE EXTENSION amcheck;". From there, you can run a
> query like the following (you may want to customize this):
>
> SELECT bt_index_parent_check(index => c.oid, heapallindexed => true),
> c.relname,
> c.relpages
> FROM pg_index i
> JOIN pg_opclass op ON i.indclass[0] = op.oid
> JOIN pg_am am ON op.opcmethod = am.oid
> JOIN pg_class c ON i.indexrelid = c.oid
> JOIN pg_namespace n ON c.relnamespace = n.oid
> WHERE am.amname = 'btree'
> -- Don't check temp tables, which may be from another session:
> AND c.relpersistence != 't'
> -- Function may throw an error when this is omitted:
> AND c.relkind = 'i' AND i.indisready AND i.indisvalid
> ORDER BY c.relpages DESC;
>
> If this query takes too long to complete you may find it useful to add
> something to limit the indexes check, such as: AND n.nspname =
> 'public' -- that change to the SQL will make the query just test
> indexes from the public schema.
>
> Do "SET client_min_messages=DEBUG1 " to get a kind of rudimentary
> progress indicator, if that seems useful to you.
>
> The docs have further information on what this bt_index_parent_check
> function does, should you need it:
> https://www.postgresql.org/docs/13/amcheck.html
>
> --
> Peter Geoghegan
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-05-14 19:25:07 Re: BUG #16833: postgresql 13.1 process crash every hour
Previous Message PG Bug reporting form 2021-05-14 18:52:47 BUG #17013: All RH6 repos are missing repomod.xml.asc files.