Adding primary key on table with 3 billion records.

From: Irene Yeh <ireney(at)fair(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Adding primary key on table with 3 billion records.
Date: 2019-10-30 04:17:20
Message-ID: CAFA_4L8J8M5tQgFsGA7xWL+b8s7=Ud1mvUky6mWm3zEUABGNmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi there,

We've been trying to figure this out but has no luck. Hopefully someone
there would have an answer.

What we've done so far:
1. We make sure the ID column has no null value
2. We created a unique index on the ID column concurrently to avoid locking

What we are planning on doing next
1. Add a not null constraint on the ID column with NOT VALID option so that
it will run very quickly
2. Run the validate constraint check. It takes a really long time but at
least there's no locking
3. Run the "alter table add primary key with index" command to promote the
index into primary key.

Here are some issues we're having right now
1. The DB started autovacuum after we created a unique index. It's been
running for a really really long time and it won't let us alter the table
to add null constraint. I tried to cancel the autovacuum job but it just
starts backup.
2. I created a duplicated database and was able to get to add primary key
with index step. However, it seems like even though I've already validated
not null constraint, the alter table command still took a really long time
to run. This alter table command has a read/write lock, which is
problematic for us on a production database.

My questions are:
1. Can we safely cancel autovacuum job multiple times? I've canceled the
job twice without knowing that it'll come back up. What's the consequence
of doing that?
2. Is there anyway to make the alter table add primary key step faster? I'm
not super familiar with this but from I've read in the documentation,
nullable constraint is the only difference between primary key and a unique
index. If I've verify in the pg_constraint table that the not null
constraint has convalidated column = True, why does it still take so long
to add the primary key?

My postgres version is 9.5.15 and we're running this on AWS. Really
appreciate any help in advance!!!

Here are some commands we've ran so far:

CREATE UNIQUE INDEX CONCURRENTLY table_pkey ON table USING btree (id);
ALTER TABLE table ADD CONSTRAINT table_pkey CHECK (id IS NOT NULL) NOT
VALID;
ALTER TABLE table VALIDATE CONSTRAINT table_pkey;
ALTER TABLE table ADD PRIMARY KEY USING INDEX table_pkey;

Best,

--
<https://www.fair.com/s/c88e180e>
Irene Yeh
Senior Software Engineer
T 800 584 5000
ireney(at)fair(dot)com

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Laurenz Albe 2019-10-30 08:03:28 Re: Adding primary key on table with 3 billion records.
Previous Message Stephen Froehlich 2019-10-23 16:37:58 RE: Match against a column of regexes?