From: | Jack Cushman <jcushman(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Duplicating data folder without tablespace, for read access |
Date: | 2018-08-14 15:57:38 |
Message-ID: | CAEv_OHUyxVEL+yt89dLYV5gGPxAkSSNCr=Afxg6S9RuYnags8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi --
I'm wondering whether, in my specific situation, it would be safe to copy a
database cluster's data folder, and bring up the copy for read access,
without copying a tablespace linked from it. My situation (described below)
involves a database with a 100GB table and a 600GB table where I want to
routinely clone just the 100GB table for web access.
---
For context, the last discussion I've found is from 2013, in this blog post
from Christophe Pettus and response from Tom Lane:
https://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/
https://www.postgresql.org/message-id/19786.1367378009@sss.pgh.pa.us
In that discussion, Christophe summarized the situation this way:
> I would not count on it. And if it works 100% reliably now, it might not
on a future version of PostgreSQL.
> As Josh Berkus pointed out to my off-list, there are two competing
definitions of the term "recover" in use here:
> 1. In my blog post, the definition of "recover" was "bring up the
database without having unusually extensive knowledge of PostgreSQL's
internals."
> 2. For Tom, the definition of "recover" is "bring up the database if you
have appropriate knowledge of PostgreSQL's internals."
> You can't recover from the lost of a tablespace per definition #1. You
can per definition #2.
> I'd strongly suggest that relying on definition #2, while absolutely
correct, is a poor operational decision for most users.
https://www.postgresql.org/message-id/FABAC7F1-3172-4B5D-8E56-0B3C579980EC%40thebuild.com
---
Now here's the situation where I want to do what Christophe said not to do:
:)
I have a large database of text, with a 600GB table and a 100GB table
connected by a join table. They both see occasional updates throughout the
week. Once a week I want to "cut a release," meaning I will clone just the
100GB table and copy it to a "release" server for read-only web access.
My procedure would be:
- keep the 600GB table on a separate tablespace
- cleanly stop postgres on both servers
- copy the data folder to the release server
- delete pg_tblspc/* on the release server
- start postgres on both servers
In local testing this seems to work -- the release server works fine, and I
only get an error message if I try to access the missing tables, which is
expected. But are there reasons this is going to bite me if I try it in
production? I'm hoping it helps that (a) I'm only doing read access, (b) I
can cleanly stop both servers before cutting a release, and (c) I'm not
worried about losing data, since it's just an access copy.
Alternatives I've considered:
- I could pg_dump and restore, but the 100GB table has lots of indexes and
I'd rather not have to reindex on the release server each week.
- I could replicate with pglogical and use some sort of blue-green setup on
the release server to cut a release, but this adds a lot of moving parts,
especially to deal with schema migrations.
Thanks for any advice you might have!
-Jack
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-08-14 16:14:59 | Re: pg_basebackup failed to read a file |
Previous Message | Tom Lane | 2018-08-14 15:07:13 | Re: Best Practices for Extensions, limitations and recommended use for monitoring |