From: | "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add a new pg_walinspect function to extract FPIs from WAL records |
Date: | 2023-01-10 08:29:03 |
Message-ID: | dced46f2-1f10-7ac7-2f5c-f11be5d49990@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 1/6/23 6:41 PM, Bharath Rupireddy wrote:
> On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>
>> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
>> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>>
>>>> I'm also wondering if it would make sense to extend the test coverage of it (and pg_waldump) to "validate" that both
>>>> extracted images are the same and matches the one modified right after the checkpoint.
>>>>
>>>> What do you think? (could be done later in another patch though).
>>>
>>> I think pageinspect can be used here. We can fetch the raw page from
>>> the table after the checkpoint and raw FPI from the WAL record logged
>>> as part of the update. I've tried to do so [1], but I see a slight
>>> difference in the raw output. The expectation is that they both be the
>>> same. It might be that the update operation logs the FPI with some
>>> more info set (prune_xid). I'll try to see why it is so.
>>>
>>> I'm attaching the v2 patch for further review.
>>>
>>> [1]
>>> SELECT * FROM page_header(:'page_from_table');
>>> lsn | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>> 0/1891D78 | 0 | 0 | 40 | 8064 | 8192 | 8192 |
>>> 4 | 0
>>> (1 row)
>>>
>>> SELECT * FROM page_header(:'page_from_wal');
>>> lsn | checksum | flags | lower | upper | special | pagesize |
>>> version | prune_xid
>>> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>>> 0/1891D78 | 0 | 0 | 44 | 8032 | 8192 | 8192 |
>>> 4 | 735
>>> (1 row)
>>
>> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
>> further review. Sorry for the noise.
>
> I took a stab at how and what gets logged as FPI in WAL records:
>
> Option 1:
> WAL record with FPI contains both the unmodified table page from the
> disk after checkpoint and new tuple (not applied to the unmodified
> page) and the recovery (redo) applies the new tuple to the unmodified
> page as part of recovery. A bit more WAL is needed to store both
> unmodified page and new tuple data in the WAL record and recovery can
> get slower a bit too as it needs to stitch the modified page.
>
> Option 2:
> WAL record with FPI contains only the modified page (new tuple applied
> to the unmodified page from the disk after checkpoint) and the
> recovery (redo) just returns the applied block as BLK_RESTORED.
> Recovery can get faster with this approach and less WAL is needed to
> store just the modified page.
>
> My earlier understanding was that postgres does option (1), however, I
> was wrong, option (2) is what actually postgres has implemented for
> the obvious advantages specified.
>
> I now made the tests a bit stricter in checking the FPI contents
> (tuple values) pulled from the WAL record with raw page contents
> pulled from the table using the pageinspect extension. Please see the
> attached v4 patch.
>
Thanks for updating the patch!
+-- Compare FPI from WAL record and page from table, they must be same
I think "must be the same" or "must be identical" sounds better (but not 100% sure).
Except this nit, V4 looks good to me.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Ankit Kumar Pandey | 2023-01-10 08:31:52 | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Previous Message | Amit Kapila | 2023-01-10 08:25:56 | Re: typos |