creating a subset DB efficiently ?

From: David Gauthier <dfgpostgres(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: creating a subset DB efficiently ?
Date: 2024-03-08 16:22:17
Message-ID: CAEs=6DmuXckUagHG2-mjfXWcjXhU6U7ZqpwB9WxxGdrGKi3DyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here's the situation....

- The DB contains data for several projects.
- The tables of the DB contain data for all projects (data is not
partitioned on project name or anything like that)
- The "project" identifier (table column) exists in a few "parent" tables
with many child... grandchild,... tables under them connected with foreign
keys defined with "on delete cascade". So if a record in one of the parent
table records is deleted, all of its underlying, dependent records get
deleted too.
- New projects come in, and old ones need to be removed and "archived" in
DBs of their own. So there's a DB called "active_projects" and there's a
DB called "project_a_archive" (identical metadata).
- The idea is to copy the data for project "a" that's in "active_projects"
to the "project_a_arhchive" DB AND delete the project a data out of
"active_projects".
- Leave "project_a_archive" up and running if someone needs to attach to
that and get some old/archived data.

The brute-force method I've been using is...
1) pg_dump "active_projects" to a (huge) file then populate
"project_a_archive" using that (I don't have the privs to create database,
IT creates an empty one for me, so this is how I do it).
2) go into the "project_a_archive" DB and run... "delete from par_tbl_1
where project <> 'a' ", "delete from par_tbl_2 where project <> 'a' ",
etc... leaving only project "a" data in the DB.
3) go into the "active_projects" DB and "delete from par_tbl_1 where
project = 'a' ", etc... removing project "a" from the "active_projects DB.

Ya, not very elegant, it takes a long time and it takes a lot of
resources. So I'm looking for ideas on how to do this better.

Related question...
The "delete from par_tbl_a where project <> 'a' " is taking forever. I
fear it's because it's trying to journal everything in case I want to
rollback. But this is just in the archive DB and I don't mind taking the
risk if I can speed this up outside of a transaction. How can I run a
delete command like this without the rollback recovery overhead ?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yogesh Sharma 2024-03-08 16:28:20 Re: v11.5- v15.3 upgrade (linux)
Previous Message Adrian Klaver 2024-03-08 16:06:43 Re: Windows service randomly stops with no indication why