Re: Fastest way to clone schema ~1000x

From: Emiel Mols <emiel(at)crisp(dot)nl>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: daniel(at)yesql(dot)se, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Fastest way to clone schema ~1000x
Date: 2024-02-29 08:43:51
Message-ID: CAF5w50638UxTfEkMLsyg_8vBbb=ymR6V1J7U4ia=kSy=+_B6+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

An update on this for anyone else who has similar issues/considerations:
- As suggested, we are sticking to 1 database-per-test, and not a
schema-per-test.
- We moved our testing setup to maintain at most 1 connection per backend
worker. If a request for a different test comes in, we close and reconnect
to the different DB (in the future, we might test some affinity based
routing here). This does empirically seem to improve performance around
~5-10% over maintaining 1000+ connections. It probably helps that the
connection is local over a unix domain socket. We now set max_connections
in testing to 512, where we usually max out at ~200.
- We tried PgBouncer, which seems to work as well, but doesn't improve
performance much compared to the previous point. (Also it adds quite some
complexity, we still need to maintain the 1000+ connections to the pooler
as it doesn't support changing databases and the user names need to be
known/configured ahead of time which is annoying for us)
- We looked more into the `native_queued_spin_lock_slowpath` issue,
distilling a flamegraph to identify the actual caller(s): our latest
thinking now is that this is an artifact of measurement with perf/bpf (!!),
so the concerns with this can be ignored otherwise. Flame graph:
https://ibb.co/sW34mgq. Reason to think so is that
`bpf_trampoline_6442485415_0` is in its call path. (Please do reach out if
you think this is wrong)

Thanks again for the quick help.

- Emiel

On Mon, Feb 26, 2024 at 5:10 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 2/26/24 01:06, Emiel Mols wrote:
> > On Mon, Feb 26, 2024 at 3:50 PM Daniel Gustafsson <daniel(at)yesql(dot)se
> > <mailto:daniel(at)yesql(dot)se>> wrote:
> >
> > There is a measurable overhead in connections, regardless of if they
> > are used
> > or not. If you are looking to squeeze out performance then doing
> > more over
> > already established connections, and reducing max_connections, is a
> > good place
> > to start.
> >
> >
> > Clear, but with database-per-test (and our backend setup), it would have
> > been *great* if we could have switched database on the same connection
> > (similar to "USE xxx" in mysql). That would limit the connections to the
> > amount of workers, not multiplied by tests.
>
> That is because:
>
> https://dev.mysql.com/doc/refman/8.3/en/glossary.html#glos_schema
>
> "In MySQL, physically, a schema is synonymous with a database. You can
> substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax,
> for example using CREATE SCHEMA instead of CREATE DATABASE. "
>
>
> >
> > Even with a pooler, we're still going to be maintaining 1000s of
> > connections from the backend workers to the pooler. I would expect this
> > to be rather efficient, but still unnecessary. Also, both
> > pgbouncer/pgpool don't seem to support switching database in-connection
> > (they could have implemented the aforementioned "USE" statement I
> > think!). [Additionally we're using PHP that doesn't seem to have a good
> > shared memory pool implementation -- pg_pconnect is pretty buggy].
> >
> > I'll continue with some more testing. Thanks for now!
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2024-02-29 09:03:41 Re: Non-Stored Generated Columns
Previous Message Matthew Dennison 2024-02-29 08:18:31 RE: Unable to get PostgreSQL 15 with Kerberos (GSS) working