From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Speeding up ruleutils' name de-duplication code, redux |
Date: | 2024-09-10 09:57:21 |
Message-ID: | CAApHDvriAQ4W=2B-t7R+N77Bmq+HZLAf3ReQiyGXeZMke7yibQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 30 Jul 2024 at 10:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> -----
> for (my $i = 0; $i < 100; $i++)
> {
> print "CREATE TABLE test_inh_check$i (\n";
> for (my $j = 0; $j < 1000; $j++)
> {
> print "a$j float check (a$j > 10.2),\n";
> }
> print "b float);\n";
> print "CREATE TABLE test_inh_check_child$i() INHERITS(test_inh_check$i);\n";
> }
> -----
>
> On my development machine, it takes over 14 minutes to pg_upgrade
> this, and it turns out that that time is largely spent in column
> name de-duplication while deparsing the CHECK constraints. The
> attached patch reduces that to about 3m45s.
I think this is worth doing. Reducing the --schema-only time in
pg_dump is a worthy goal to reduce downtime during upgrades.
I looked at the patch and tried it out. I wondered about the choice of
32 as the cut-off point so decided to benchmark using the attached
script. Here's an extract from the attached results:
Patched with 10 child tables
pg_dump for 16 columns real 0m0.068s
pg_dump for 31 columns real 0m0.080s
pg_dump for 32 columns real 0m0.083s
This gives me what I'd expect to see. I wanted to ensure the point
where you're switching to the hashing method was about the right
place. It seems to be, at least for my test.
The performance looks good too:
10 tables:
master: pg_dump for 1024 columns real 0m23.053s
patched: pg_dump for 1024 columns real 0m1.573s
100 tables:
master: pg_dump for 1024 columns real 3m29.857s
patched: pg_dump for 1024 columns real 0m23.053s
Perhaps you don't think it's worth the additional complexity, but I
see that in both locations you're calling build_colinfo_names_hash(),
it's done just after a call to expand_colnames_array_to(). I wondered
if it was worthwhile unifying both of those functions maybe with a new
name so that you don't need to loop over the always NULL element of
the colnames[] array when building the hash table. This is likely
quite a small overhead compared to the quadratic search you've
removed, so it might not move the needle any. I just wanted to point
it out as I've little else I can find to comment on.
David
Attachment | Content-Type | Size |
---|---|---|
pgdump_bench.sh.txt | text/plain | 561 bytes |
pgdump_bench_results.txt | text/plain | 2.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | wenhui qiu | 2024-09-10 10:06:47 | Re: Proposal to Enable/Disable Index using ALTER INDEX |
Previous Message | shveta malik | 2024-09-10 09:55:58 | Re: Conflict detection for update_deleted in logical replication |