Quick Links

Re: Speeding up ruleutils' name de-duplication code, redux

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Speeding up ruleutils' name de-duplication code, redux
Date:	2024-09-10 15:06:46
Message-ID:	3472644.1725980806@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> On Tue, 30 Jul 2024 at 10:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On my development machine, it takes over 14 minutes to pg_upgrade
>> this, and it turns out that that time is largely spent in column
>> name de-duplication while deparsing the CHECK constraints. The
>> attached patch reduces that to about 3m45s.

> I looked at the patch and tried it out.

Thanks for looking!

> This gives me what I'd expect to see. I wanted to ensure the point
> where you're switching to the hashing method was about the right
> place. It seems to be, at least for my test.

Yeah, I was just going by gut feel there. It's good to have some
numbers showing it's not a totally silly choice.

> Perhaps you don't think it's worth the additional complexity, but I
> see that in both locations you're calling build_colinfo_names_hash(),
> it's done just after a call to expand_colnames_array_to(). I wondered
> if it was worthwhile unifying both of those functions maybe with a new
> name so that you don't need to loop over the always NULL element of
> the colnames[] array when building the hash table. This is likely
> quite a small overhead compared to the quadratic search you've
> removed, so it might not move the needle any. I just wanted to point
> it out as I've little else I can find to comment on.

Hmm, but there are quite a few expand_colnames_array_to calls that
are not associated with build_colinfo_names_hash. On the whole it
feels like those are separate concerns that are better kept separate.

We could accomplish what you suggest by re-ordering the calls so that
we build the hash table before enlarging the array. 0001 attached
is the same as before (modulo line number changes from being rebased
up to HEAD) and then 0002 implements this idea on top. On the whole
though I find 0002 fairly ugly and would prefer to stick to 0001.
I really doubt that scanning any newly-created column positions is
going to take long enough to justify intertwining things like this.

regards, tom lane

Attachment	Content-Type	Size
v2-0001-speed-up-column-name-deduplication.patch	text/x-diff	9.9 KB
v2-0002-avoid-useless-scanning.patch	text/x-diff	3.1 KB

In response to

Re: Speeding up ruleutils' name de-duplication code, redux at 2024-09-10 09:57:21 from David Rowley

Responses

Re: Speeding up ruleutils' name de-duplication code, redux at 2024-09-10 20:33:59 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2024-09-10 15:12:22	Re: Proposal to Enable/Disable Index using ALTER INDEX
Previous Message	Greg Sabino Mullane	2024-09-10 14:41:18	Re: Jargon and acronyms on this mailing list