From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" |
Date: | 2021-06-24 07:52:37 |
Message-ID: | 19190f79-cf37-ff18-1b40-07a1a66a1d9e@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 23/06/2021 12:45, Thomas Munro wrote:
> On Wed, Jun 23, 2021 at 7:46 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Let's just add the lock there.
>
> +1, no doubt about that.
Committed that. Thanks for the report, Alexander!
>> ... What about the new kid on the block:
>> Persistent Memory? I found this article:
>> https://lwn.net/Articles/686150/. So at hardware level, Persistent
>> Memory only guarantees atomicity at cache line level (64 bytes). To
>> provide the traditional 512 byte sector atomicity, there's a feature in
>> Linux called BTT. Perhaps we should add a note to the docs that you
>> should enable that.
>
> Right, also called sector mode. I don't know enough about that to
> comment really, but... if my google-fu is serving me, you can't
> actually use interesting sector sizes like 8KB (you have to choose 512
> or 4096 bytes), so you'll have to pay for *two* synthetic atomic page
> schemes: BTT and our full page writes. That makes me wonder... if you
> need to leave full page writes on anyway, maybe it would be a better
> trade-off to do double writes of our special atomic files (relmapper
> files and control file) so that we could safely turn BTT off and avoid
> double-taxation for relation data. Just a thought. No pmem
> experience here, I could be way off.
Yeah, you wouldn't want to turn on BTT for anything else than the
pg_control file. That's the only place where we rely on sector
atomicity, I believe. For everything else, it just adds overhead. Not
sure how much overhead; maybe it doesn't matter in practice.
>> We haven't heard of broken control files from the field, so that doesn't
>> seem to be a problem in practice, at least not yet. Still, I would sleep
>> better if the control file had more redundancy. For example, have two
>> copies of it on disk. At startup, read both copies, and if they're both
>> valid, ignore the one with older timestamp. When updating it, write over
>> the older copy. That way, if you crash in the middle of updating it, the
>> old copy is still intact.
>
> +1, with a flush in between so that only one can be borked no matter
> how the storage works. It is interesting how few reports there are on
> the mailing list of a control file CRC check failures though, if I'm
> searching for the right thing[1].
>
> [1] https://www.postgresql.org/search/?m=1&q=calculated+CRC+checksum+does+not+match+value+stored+in+file&l=&d=-1&s=r
If anyone wants a write a patch for that, I'd be happy to review it. And
if anyone has access to a system with pmem hardware, it would be
interesting to try to reproduce a torn sector and broken control file by
pulling the power plug.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2021-06-24 08:06:57 | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" |
Previous Message | Vladimir Shvartsgor | 2021-06-24 06:29:40 | Re: Example in "42.8. Transaction Management" doesn't work for PostgreSQL v 12.7 |