Quick Links

Re: improve performance of pg_dump with many sequences

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: improve performance of pg_dump with many sequences
Date:	2024-07-10 22:05:11
Message-ID:	Zo8Fl53vBVS1Cnmt@nathan
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jul 10, 2024 at 05:08:56PM -0300, Euler Taveira wrote:
> Nice improvement. The numbers for a realistic scenario (10k sequences) are

Thanks for taking a look!

> You are changing internal representation from char to int64. Is the main goal to
> validate catalog data? What if there is a new sequence data type whose
> representation is not an integer?

IIRC 0001 was primarily intended to reduce the amount of memory needed for
the sorted table. Regarding a new sequence data type, I'm assuming we'll
have much bigger fish to fry if we do that (e.g., pg_sequence uses int8 for
the values), and I'd hope that adjusting this code wouldn't be too
difficult, anyway.

> This code path is adding zero byte to the last position of the fixed string. I
> suggest that the zero byte is added to the position after the string length.

I'm not following why that would be a better approach. strncpy() will add
a NUL to the end of the string unless it doesn't fit in the buffer, in
which case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'".
Furthermore, the compiler can determine the position where the NUL should
be placed, whereas placing it at the end of the copied string requires a
runtime strlen().

> l = strlen(PQgetvalue(res, 0, 0));
> Assert(l < sizeof(seqtype));
> strncpy(seqtype, PQgetvalue(res, 0, 0), l);
> seqtype[l] = '\0';

I think the strncpy() should really be limited to the size of the seqtype
buffer. IMHO an Assert is not sufficient.

> If you are not planning to apply 0003, make sure you fix collectSequences() to
> avoid versions less than 10. Move this part to 0002.

Yeah, no need to create the table if we aren't going to use it.

> Since you apply a fix for pg_sequence_last_value function, you can simplify the
> query in 0003. CASE is not required.

Unfortunately, I think we have to keep this workaround since older minor
releases of PostgreSQL don't have the fix.

> patched (0001 and 0002):
> real 0m0,290s
> user 0m0,038s
> sys 0m0,104s
>
> I'm not sure if 0003 is worth. Maybe if you have another table like you
> suggested.

What pg_dump command did you test here? Did you dump the sequence data, or
was this --schema-only?

--
nathan

In response to

Re: improve performance of pg_dump with many sequences at 2024-07-10 20:08:56 from Euler Taveira

Responses

Re: improve performance of pg_dump with many sequences at 2024-07-11 02:52:33 from Euler Taveira

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alena Rybakina	2024-07-10 23:30:22	Re: POC, WIP: OR-clause support for indexes
Previous Message	Nathan Bossart	2024-07-10 21:41:01	Re: improve predefined roles documentation