From: | Andreas Joseph Krogh <andreas(at)visena(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files |
Date: | 2017-07-17 10:34:04 |
Message-ID: | VisenaEmail.4e.a8c89cfec01d15d.15d5001bcc2@tc7-visena |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi.
After struggeling with storing LOs in the database and making efficient
backups/restore-routines I've decided to move LOs out of the DB and use a
"filename"-column containing an UUID referencing the file one the filesystem.
Having this schema I need a robust book-keeping mechanism so I know when I can
delete files on the filesystem when no longer referenced from the database.
In an email-program I store the whole email-body as a binary in a file and
store all header-info in an email_message table (for this example's sake).
I have a book-keeping table :
create table origo_email_message_file_operation( filename VARCHAR not null,
operationVARCHAR not NULL, PRIMARY KEY (filename, operation) );
And a email_message table holding the messages, with a "filename"-column
referencing the BLOB.
For DELETE I use this routine:
deleteData(fileName, messageId) {
startTX()
deleteEmail() // Deletes the entry in email_message table
INSERT INTO origo_email_message_file_operation(filename, operation) VALUES(
'3b1d18ae-7b54-c055-1016-d928daec7294','DELETE'); deleteEmail(messageId)
commitTX()
}
So for DELETE-operations the file on disk isn't deleted by the main-program,
but by a cleanup-job which runs as a cron-job and the
inspects origo_email_message_file_operation for DELETE-entries and then delete
the referenced files, then removes the DELETE-entries from
origo_email_message_file_operation.
INSERT is like this:
insertData(fileName) {
startTX()
// First, register the INSERT in case it fails
INSERT INTO origo_email_message_file_operation(filename, operation)
VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','INSERT');
commitTX()
startTX()
insertEmail() // Inserts the entry in email_message table
DELETE FROM origo_email_message_file_operation WHERE filename =
'3b1d18ae-7b54-c055-1016-d928daec7294'AND operation = 'INSERT';
// If this commits there is no entry left
in origo_email_message_file_operation and we're all good
commitTX()
}
UPDATE is implemented as INSERT + DELETE.
The challenge with this is what to do if INSERT rolls back. If INSERT rolls
back then we end up with an INSERT-entry in origo_email_message_file_operation
with no corresponding "filename"-entry in email_message. But I fail to se how a
cleanup job can know the difference between such an INSERT-entry
in origo_email_message_file_operation caused by ROLLBACK and and INSERT-entry
caused by an in-progress insertEmail() operation.
Does anyone have a robust mechanism for cleaning up files in such scenarios?
Thanks.
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>
From | Date | Subject | |
---|---|---|---|
Next Message | Clodoaldo Neto | 2017-07-17 13:34:31 | Re: Failed DNF dependency in Fedora |
Previous Message | Michael Paquier | 2017-07-16 15:47:20 | Re: Monitoring of a hot standby with a largely idle master |