Quick Links

Re: Speeding up ruleutils' name de-duplication code, redux

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Speeding up ruleutils' name de-duplication code, redux
Date:	2024-09-10 09:57:21
Message-ID:	CAApHDvriAQ4W=2B-t7R+N77Bmq+HZLAf3ReQiyGXeZMke7yibQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 30 Jul 2024 at 10:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> -----
> for (my $i = 0; $i < 100; $i++)
> {
> print "CREATE TABLE test_inh_check$i (\n";
> for (my $j = 0; $j < 1000; $j++)
> {
> print "a$j float check (a$j > 10.2),\n";
> }
> print "b float);\n";
> print "CREATE TABLE test_inh_check_child$i() INHERITS(test_inh_check$i);\n";
> }
> -----
>
> On my development machine, it takes over 14 minutes to pg_upgrade
> this, and it turns out that that time is largely spent in column
> name de-duplication while deparsing the CHECK constraints. The
> attached patch reduces that to about 3m45s.

I think this is worth doing. Reducing the --schema-only time in
pg_dump is a worthy goal to reduce downtime during upgrades.

I looked at the patch and tried it out. I wondered about the choice of
32 as the cut-off point so decided to benchmark using the attached
script. Here's an extract from the attached results:

Patched with 10 child tables
pg_dump for 16 columns real 0m0.068s
pg_dump for 31 columns real 0m0.080s
pg_dump for 32 columns real 0m0.083s

This gives me what I'd expect to see. I wanted to ensure the point
where you're switching to the hashing method was about the right
place. It seems to be, at least for my test.

The performance looks good too:

10 tables:
master: pg_dump for 1024 columns real 0m23.053s
patched: pg_dump for 1024 columns real 0m1.573s

100 tables:
master: pg_dump for 1024 columns real 3m29.857s
patched: pg_dump for 1024 columns real 0m23.053s

Perhaps you don't think it's worth the additional complexity, but I
see that in both locations you're calling build_colinfo_names_hash(),
it's done just after a call to expand_colnames_array_to(). I wondered
if it was worthwhile unifying both of those functions maybe with a new
name so that you don't need to loop over the always NULL element of
the colnames[] array when building the hash table. This is likely
quite a small overhead compared to the quadratic search you've
removed, so it might not move the needle any. I just wanted to point
it out as I've little else I can find to comment on.

David

Attachment	Content-Type	Size
pgdump_bench.sh.txt	text/plain	561 bytes
pgdump_bench_results.txt	text/plain	2.0 KB

In response to

Speeding up ruleutils' name de-duplication code, redux at 2024-07-29 22:14:10 from Tom Lane

Responses

Re: Speeding up ruleutils' name de-duplication code, redux at 2024-09-10 15:06:46 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	wenhui qiu	2024-09-10 10:06:47	Re: Proposal to Enable/Disable Index using ALTER INDEX
Previous Message	shveta malik	2024-09-10 09:55:58	Re: Conflict detection for update_deleted in logical replication