pgarchives: Bug report + Patches: loader can't handle message in multiple lists

From: Célestin Matte <celestin(dot)matte(at)cmatte(dot)me>
To: PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: pgarchives: Bug report + Patches: loader can't handle message in multiple lists
Date: 2023-03-22 15:13:18
Message-ID: ed72a307-e40f-98b9-d08a-e5bf331525ae@cmatte.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

The messages loader from pgarchives may crash when importing 2 mailing lists with messages in common. In another word: importing script will fail when importing list1 and list2 if a message to list1 has list2 in CC:.

I attach a patch (0001-loader-attempt-to-handle-message-in-multiple-lists.patch) that starts addressing the issue, but does not fully fixes it, as the script can later crash (in storage.py line 234) because a message cannot be imported twice. Fixing this would require changing the way messages are stored in the database, using (messageid, listid) as a primary key instead of messageid, or allowing a message to belong to several threads using message_thread table instead of a threadid column in messages.
This patch is only for discussion.

I also attach patch 0001-load_message-catch-postgres-UniqueViolation-errors-w.patch as a workaround to this issue, to catch and log such errors and keep importing an mbox without crashing when it happens (message will then only appear in the first imported list).
This patch can be used/applied.
--
Célestin Matte

Attachment Content-Type Size
0001-load_message-catch-postgres-UniqueViolation-errors-w.patch text/x-patch 1.4 KB
0001-loader-attempt-to-handle-message-in-multiple-lists.patch text/x-patch 2.0 KB

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Karl O. Pinc 2023-03-23 03:23:38 Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
Previous Message Jonathan S. Katz 2023-03-22 14:24:42 Re: missing files because versioning?