From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, RECHTÉ Marc <marc(dot)rechte(at)meteo(dot)fr> |
Subject: | RE: Logical replication timeout |
Date: | 2024-12-23 12:29:28 |
Message-ID: | OSCPR01MB14966B646506E0C9B81B3A4CFF5022@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Shlok,
>
> Thanks for sharing the analysis.
>
> I tested the patch on my machine as well and it has worse performance
> for me as well.
> I came up with an alternate approach. In this approach we keep track
> of wal segment the transaction is part of. This helps to iterate
> through only required files during clean up.
>
> On my machine, I am running the testcase provided by you in [1]. It is
> generating ~1.9 million spill files. For me the transaction completed
> in 56sec.
> Cleanup (deletion of spill files) took around following time:
> With HEAD : ~ 5min
> With latest patch (attached here) : ~2min
>
> Can you test if this improves performance for you?
I'm also not sure the performance, but I can post my comments.
I'm not sure your patch can properly handle the list operations.
```
+ oldcontext = MemoryContextSwitchTo(rb->context);
+ txn->walsgmts = lappend(txn->walsgmts, curOpenSegNo);
+ MemoryContextSwitchTo(oldcontext);
+
```
IIUC lappend() accepts a point of a Datum, but here a normal value is passed.
Should we define a new struct which represents a node of list and append it
after it is palloc()'d?
Or your code is enough for some reasons?
```
/* iterate over all possible filenames, and delete them */
- for (cur = first; cur <= last; cur++)
+ foreach(cell, txn->walsgmts)
{
+ XLogSegNo curr_segno = (XLogSegNo) lfirst(cell);
char path[MAXPGPATH];
- ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid, cur);
+ ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid, curr_segno);
if (unlink(path) != 0 && errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", path)));
}
+
+ if(txn->walsgmts != NIL)
+ {
+ pfree(txn->walsgmts);
+ txn->walsgmts = NIL;
+ }
```
If above comment is accepted, I feel you can use foreach_delete_current().
=======
Also, even when we optimize the truncation of files, there is a possibility that
replication is timed out. Can you also create a patch which implements [1]?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Lakhin | 2024-12-23 13:00:00 | Re: Make tuple deformation faster |
Previous Message | Vladlen Popolitov | 2024-12-23 12:16:50 | Re: Exporting float_to_shortest_decimal_buf(n) with Postgres 17 on Windows |