From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Emiel Mols <emiel(at)crisp(dot)nl> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Fastest way to clone schema ~1000x |
Date: | 2024-02-26 06:36:18 |
Message-ID: | CAFj8pRC8rGpVOZYkcsv7cx6i4MmqiD5eGF=GdCKRODKXjVSb4Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi
po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel(at)crisp(dot)nl> napsal:
> Hello,
>
> To improve our unit and end-to-end testing performance, we are looking to
> optimize initialization of around 500-1000 database *schemas* from a
> schema.sql file.
>
> Background: in postgres, you cannot change databases on
> existing connections, and using schemas would allow us to minimize the
> amount of active connections needed from our test runners, as we can reuse
> connections by updating search_path. In our current database-per-test setup
> we see that there is around a ~30% (!!) total CPU overhead in
> native_queued_spin_lock_slowpath (as profiled with linux perf), presumably
> because of the high connection count. We run ~200 tests in parallel to
> saturate a 128 core machine.
>
> In the schema-per-test setup, however, it becomes harder to cheaply create
> the schema. Before we could `create database test001 with template
> testbase` to set up the database for a test, and this was reasonably fast.
> Re-inserting a schema with ~500 table/index definitions across 500 test
> schema's is prohibitively expensive (around 1000ms per schema insertion
> means we're wasting 500 cpu-seconds, and there seems to be quite some lock
> contention too). Linux perf shows that in this setup we're reducing the
> native_queued_spin_lock_slowpath overhead to around 10%, but total test
> time is still slower due to all schema initialization being done. Also it
> feels quite elegant functions and types can be reused between tests.
>
> Does anyone have some insights or great ideas :)? Also pretty curious to
> the fundamental reason why having high/changing connection counts to
> postgres results in this much (spin)lock contention (or perhaps we're doing
> something wrong in either our configuration or measurements?).
>
> An alternative we haven't explored yet is to see if we can use pgbouncer
> or other connection pooler to mitigate the 30% issue (set limits so there
> are only ever X connections to postgres in total, and perhaps max Y per
> test/database). This does add another piece of infrastructure/complexity,
> so not really prefered.
>
For testing
a) use templates - CREATE DATABASE test TEMPLATE some;
b) disable fsync (only for testing!!!)
Regards
Pavel
> Best,
>
> Emiel
>
From | Date | Subject | |
---|---|---|---|
Next Message | Lok P | 2024-02-26 06:40:03 | Re: Creating table and indexes for new application |
Previous Message | Emiel Mols | 2024-02-26 06:28:19 | Fastest way to clone schema ~1000x |