From: | Erik Nordström <erik(at)timescale(dot)com> |
---|---|
To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: Changed behavior in rewriteheap |
Date: | 2024-11-22 08:11:21 |
Message-ID: | CACAa4VLT1GB+BR8kgrFbF6GEUNbAAWEb+6NpMUPzYSAA-PL1vg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 22, 2024 at 12:30 AM Matthias van de Meent <
boekewurm+postgres(at)gmail(dot)com> wrote:
> On Thu, 21 Nov 2024, 17:18 Erik Nordström, <erik(at)timescale(dot)com> wrote:
>
>> Hello,
>>
>> I've noticed a change in behavior of the heap rewrite functionality in
>> PostgreSQL 17, used by, e.g., CLUSTER. I've been experimenting with the
>> functionality to implement a way to merge partitions in TimescaleDB. I am
>> using table_relation_copy_for_cluster() to write the data of several tables
>> to a single merged table, and then I do a heap swap on one of the original
>> tables while dropping the others. So, if you have 3 partitions and want to
>> merge them to one, then I write all three partitions to a temporary heap,
>> swap the new heap on partition 1 and then drop partitions 2 and 3.
>>
>> Now, this approach worked fine for PostgreSQL 15 and 16, but 17
>> introduced some changes that altered the behavior so that I only see data
>> from one of the partitions after merge (the last one written).
>>
>> The commit that I think is responsible is the following:
>> https://github.com/postgres/postgres/commit/8af256524893987a3e534c6578dd60edfb782a77
>>
>> Now, I realize that this is not a problem for PostgreSQL itself since the
>> rewrite functionality isn't used for the purpose I am using it. To my
>> defense, the rewrite code seems to imply that it should be possible to
>> write more data to an existing heap according to this comment in
>> begin_heap_rewrite: /* new_heap needn't be empty, just locked */.
>>
>> I've also tried recompiling PG17 with the rewriteheap.c file from PG16
>> and then it works again. I haven't yet been able to figure out exactly what
>> is different but I will continue to try to narrow it down. In the meantime,
>> maybe someone on the mailing list has some insight on what could be the
>> issue and whether my approach is viable?
>>
>
> I think the origin of the issue is at bulk_write.c, in smgr_bulk_flush
> (specifically after line 282), which assumes the bulk write system is used
> exclusively on empty relations.
>
> If you use a separate pair of begin/end_heap_rewrite for every relation
> you merge into that heap, it'll re-initialize the bulk writer, which will
> thus overwrite the previous rewrites' pages. The pre-PG17 rewriteheap.c
> doesn't use that API, and thus isn't affected.
>
> I've CC-ed Heikki as author of that patch; maybe a new API to indicate
> bulk writes into an existing fork would make sense, to solve Timescale's
> issue and fix rewriteheap's behavior?
>
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>
Yes, looks like it is that code that zeros out the initial pages if the
write doesn't start at block 0. And it looks like I don't have access to
the bulk write state to set pages_written to trick it to believe those
pages have already been written. Maybe it is possible to add an option to
the bulkwriter to skip zero-initialization? The comment for that code seems
to indicate it isn't strictly necessary anyway.
Regards,
Erik
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2024-11-22 08:39:02 | Re: Reordering DISTINCT keys to match input path's pathkeys |
Previous Message | Bertrand Drouvot | 2024-11-22 07:49:58 | Re: per backend I/O statistics |