Re: Speeding up ruleutils' name de-duplication code, redux

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Speeding up ruleutils' name de-duplication code, redux
Date: 2024-09-10 15:06:46
Message-ID: 3472644.1725980806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> On Tue, 30 Jul 2024 at 10:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On my development machine, it takes over 14 minutes to pg_upgrade
>> this, and it turns out that that time is largely spent in column
>> name de-duplication while deparsing the CHECK constraints. The
>> attached patch reduces that to about 3m45s.

> I looked at the patch and tried it out.

Thanks for looking!

> This gives me what I'd expect to see. I wanted to ensure the point
> where you're switching to the hashing method was about the right
> place. It seems to be, at least for my test.

Yeah, I was just going by gut feel there. It's good to have some
numbers showing it's not a totally silly choice.

> Perhaps you don't think it's worth the additional complexity, but I
> see that in both locations you're calling build_colinfo_names_hash(),
> it's done just after a call to expand_colnames_array_to(). I wondered
> if it was worthwhile unifying both of those functions maybe with a new
> name so that you don't need to loop over the always NULL element of
> the colnames[] array when building the hash table. This is likely
> quite a small overhead compared to the quadratic search you've
> removed, so it might not move the needle any. I just wanted to point
> it out as I've little else I can find to comment on.

Hmm, but there are quite a few expand_colnames_array_to calls that
are not associated with build_colinfo_names_hash. On the whole it
feels like those are separate concerns that are better kept separate.

We could accomplish what you suggest by re-ordering the calls so that
we build the hash table before enlarging the array. 0001 attached
is the same as before (modulo line number changes from being rebased
up to HEAD) and then 0002 implements this idea on top. On the whole
though I find 0002 fairly ugly and would prefer to stick to 0001.
I really doubt that scanning any newly-created column positions is
going to take long enough to justify intertwining things like this.

regards, tom lane

Attachment Content-Type Size
v2-0001-speed-up-column-name-deduplication.patch text/x-diff 9.9 KB
v2-0002-avoid-useless-scanning.patch text/x-diff 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-09-10 15:12:22 Re: Proposal to Enable/Disable Index using ALTER INDEX
Previous Message Greg Sabino Mullane 2024-09-10 14:41:18 Re: Jargon and acronyms on this mailing list