SearchCatCacheList()/SearchSysCacheList() is O(n)

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: SearchCatCacheList()/SearchSysCacheList() is O(n)
Date: 2021-05-12 21:07:55
Message-ID: 20210512210755.wwexawhfflromp5m@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When working on the shared memory stats patch I needed to manufacture
having a lot of stats entries. It seemed cheaper to create functions
than relations, for fairly obvious reasons. That required calling the
functions too get those entries.

My first attempt ran into the following issue:

-- create 100k functions
DO $d$BEGIN FOR i IN 1..100000 LOOP EXECUTE format('CREATE OR REPLACE FUNCTION func_%1$s() RETURNS VOID LANGUAGE SQL AS $f$SELECT 1$f$;', i);END LOOP;END;$d$;
Time: 14853.428 ms (00:14.853)

-- call them to create stats
DO $d$BEGIN FOR i IN 1..100000 LOOP EXECUTE format('SELECT func_%1$s();', i);END LOOP;END;$d$;
Time: 106291.238 ms (01:46.291)

I had started with 1M functions, but the calls never finished.

It turns out to work more normally if you create *and* call the
functions after each other:

DO $d$BEGIN FOR i IN 1..100000 LOOP EXECUTE format('CREATE OR REPLACE FUNCTION func_%1$s() RETURNS VOID LANGUAGE SQL AS $f$SELECT 1$f$; SELECT func_%1$s();', i);END LOOP;END;$d$;
Time: 20043.375 ms (00:20.043)

The problem is that SearchCatCacheList() is not actually a hash table -
there are no buckets, in contrast to SearchCatCacheList(). The hash
values SearchCatCacheList() computes are only used to make the
comparison cheaper.

The only reason that the combined creation / call works out OK
performance wise, is that CatCacheInvalidate() destroys *all* lists, so
there only ever is one entry to match against.

This seems like a pretty large trap?

It's been that way since SearchCatCacheList() was introduced:

commit 0332d65ac4a1c843e1812755db1afc1b1109d0ea
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: 2002-04-06 06:59:25 +0000

Implement partial-key searching of syscaches, per recent suggestion
to pghackers. Use this to do searching for ambiguous functions ---
it will get more uses soon.

Tom, any chance you remember if this was an oversight, or whether you
just considered this to be OK, given the likely numbers of objects?

I mainly wrote this email because I just remembered this by accident as
part of another discussion, and thought it'd be good to have a record of
the problem...

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-12 21:26:28 Re: SearchCatCacheList()/SearchSysCacheList() is O(n)
Previous Message Robert Haas 2021-05-12 20:56:14 Re: [Patch] ALTER SYSTEM READ ONLY