Re: Modification of data in base folder and very large tables

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Ogden Brash <info(at)litika(dot)com>
Cc: Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Modification of data in base folder and very large tables
Date: 2019-10-10 12:05:24
Message-ID: 877e5czua4.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>>>> "Ogden" == Ogden Brash <info(at)litika(dot)com> writes:

Ogden> I did the restore as a data only restore so that it would not
Ogden> try to recreate any tables.

Doing data-only restores is almost always a mistake.

pg_dump/pg_restore are very careful to create things in an order that
allows the data part of the restore to run quickly: tables are created
first without any indexes or constraints, then data is loaded, then
indexes and constraints are created in bulk afterwards.

If you do a data-only restore into an existing table, then it's up to
you to avoid performance problems.

Ogden> As an experiment, I am in the process of clustering the source
Ogden> database tables by the primary key constraint. I am hoping that
Ogden> if I redo the pg_dump after that, it will contain the records in
Ogden> more-or-less primary key order and on the subsequent pg_restore
Ogden> it should not have to spend the vast majority of the time on
Ogden> reading and seeking.

This is a waste of time; just restore the data without the primary key
in place and then create it at the end.

Ogden> It is surprising to me that the cluster operations (which also
Ogden> have to churn through the entire index and all the records) are
Ogden> going *much* faster than pg_restore.

That's because cluster, as with creation of a fresh index, can do a bulk
index build: sequentially read the table, sort the values (spilling to
temporary files if need be, but these are also read and written
sequentially), and write out the index data in one sequential pass.

--
Andrew (irc:RhodiumToad)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2019-10-11 01:08:54 Re: Modification of data in base folder and very large tables
Previous Message Pavel Stehule 2019-10-10 09:41:10 Re: [HACKERS] proposal: schema variables