From: | Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Improvements in pg_dump/pg_restore toc format and performances |
Date: | 2023-09-18 22:41:24 |
Message-ID: | 8312205.NyiUUSuA9g@peanuts2 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Monday, September 18, 2023 11:52:47 PM CEST Nathan Bossart wrote:
> On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> > I ended up writing several patches that shaved some time for pg_restore
> > -l,
> > and reduced the toc.dat size.
>
> I've only just started taking a look at these patches, and I intend to do a
> more thorough review in the hopefully-not-too-distant future.
Thank you very much.
> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.
I did not notice anything running meson test -v, I'll look further into it in
the next days.
> > First patch is "finishing" the job of removing has oids support. When this
> > support was removed, instead of dropping the field from the dumps and
> > increasing the dump versions, the field was kept as is. This field stores
> > a
> > boolean as a string, "true" or "false". This is not free, and requires 10
> > bytes per toc entry.
>
> This sounds reasonable to me. I wonder why this wasn't done when WITH OIDS
> was removed in v12.
I suppose it is an oversight, or not wanting to increase the dump version for
no reason.
> > The second patch removes calls to sscanf and replaces them with strtoul.
> > This was the biggest speedup for pg_restore -l.
>
> Nice.
>
> > The third patch changes the dump format further to remove these strtoul
> > calls and store the integers as is instead.
>
> Do we need to worry about endianness here?
I used the ReadInt/WriteInt functions already defined in pg_dump that take care
of this issue, so there should be no need to worry.
> > The fourth patch is dirtier and does more changes to the dump format.
> > Instead of storing the owner, tablespace, table access method and schema
> > of each object as a string, pg_dump builds an array of these, stores them
> > at the beginning of the file and replaces the strings with integer fields
> > in the dump. This reduces the file size further, and removes a lot of
> > calls to ReadStr, thus saving quite some time.
>
> This sounds promising.
>
> > Patch Toc size Dump -s duration pg_restore -l duration
> > HEAD 214M 23.1s 1.27s
> > #1 (has oid) 210M 22.9s 1.26s
> > #2 (scanf) 210M 22.9s 1.07s
> > #3 (no strtoul) 202M 22.8s 0.94s
> > #4 (string list) 181M 23.1s 0.87s
>
> At a glance, the size improvements in 0004 look the most interesting to me.
Yes it is, and the speed benefits are interesting too (at least for my usecase)
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2023-09-18 22:48:14 | Re: Index range search optimization |
Previous Message | Nathan Bossart | 2023-09-18 21:54:42 | Re: Improvements in pg_dump/pg_restore toc format and performances |