Re: Query planning read a large amount of buffers for partitioned tables

From: bruno vieira da silva <brunogiovs(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Query planning read a large amount of buffers for partitioned tables
Date: 2025-02-17 20:25:37
Message-ID: CAB+Nuk-KUZ5Z+P_N6U6EvnHV+V2M+RSqGcZu1hqpK1H-Na-Z4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Well, the query plans were generated with pg 17.3. and the buffer usage was
half.
did pg 17.3 had any fixes to reduce the planning buffer usage?

On Mon, Feb 17, 2025 at 3:18 PM bruno vieira da silva <brunogiovs(at)gmail(dot)com>
wrote:

> Hello, I did a more comprehensive test with a different number of
> partitions and I found this:
>
> Summary buffers usage for the first call vs second call on the same
> session.
>
> Query 200, 100, 50, and 10 partitions:
> 200 Partitions: 12,828 (100MB)
> 100 Partitions: 9,329 (72MB)
> 50 Partitions: 3,305 (25MB)
> 10 Partitions: 875 (7MB)
>
> Same query on the same session:
> 200 Partitions: 205 (1.6MB)
> 100 Partitions: 5 (40KB)
> 50 Partitions: 5 (40KB)
> 10 Partitions: 5 (40KB)
>
> I did test on PG 17.3 no relevant changes.
>
> Question is, does it make sense?
>
> *these are the steps to reproduce it:*
>
> docker pull postgres:17.2
> docker run -itd -e POSTGRES_USER=bruno -e POSTGRES_PASSWORD=bruno -p
> 5500:5432 -v /home/bruno/pgdata17:/var/lib/postgresql/data --name
> postgresql postgres:17.2
> export PGHOST="localhost"
> export PGPORT=5500
> export PGDATABASE="postgres"
> export PGUSER="bruno"
> export PGPASSWORD="bruno"
>
> CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- Enables the gen_random_uuid
> function
>
> CREATE TABLE dicom_series (
> series_uid UUID DEFAULT gen_random_uuid(),
> series_description VARCHAR(255),
> modality VARCHAR(16),
> body_part_examined VARCHAR(64),
> patient_id VARCHAR(64),
> study_uid UUID DEFAULT gen_random_uuid(),
> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
> );
>
> -- Create the parent table
> CREATE TABLE dicom_sops_100_part (
> sop_uid UUID NOT NULL,
> series_uid UUID NOT NULL,
> instance_number INT,
> image_position_patient TEXT,
> image_orientation_patient TEXT,
> slice_thickness DECIMAL(10, 2),
> slice_location DECIMAL(10, 2),
> pixel_spacing TEXT,
> rows INT,
> columns INT,
> acquisition_date DATE,
> acquisition_time TIME,
> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
> ) PARTITION BY HASH (sop_uid);
>
> -- Create 100 partitions
> DO $$
> DECLARE
> partition_number INT;
> BEGIN
> FOR partition_number IN 0..99 LOOP
> EXECUTE format(
> 'CREATE TABLE dicom_sops_100_p%1$s PARTITION OF
> dicom_sops_100_part FOR VALUES WITH (MODULUS 100, REMAINDER %1$s);',
> partition_number
> );
> END LOOP;
> END $$;
>
> *Data population:*
>
> DO $$
> DECLARE
> series_count INT := 1000000; -- Number of series to create
> sops_per_series INT := 20;
> i INT;
> j INT;
> series_id UUID;
> sop_id UUID;
> BEGIN
> FOR i IN 1..series_count LOOP
> -- Insert into dicom_series table with a generated UUID
> INSERT INTO dicom_series (
> series_description,
> modality,
> body_part_examined,
> patient_id
> ) VALUES (
> 'Series Description ' || i,
> 'CT',
> 'Chest',
> 'PATIENT-' || i
> )
> RETURNING series_uid INTO series_id;
>
> FOR j IN 1..sops_per_series LOOP
> -- Insert into dicom_sops_200_part table with a generated UUID
> INSERT INTO dicom_sops_100_part (
> sop_uid,
> series_uid,
> instance_number,
> image_position_patient,
> image_orientation_patient,
> slice_thickness,
> slice_location,
> pixel_spacing,
> rows,
> columns,
> acquisition_date,
> acquisition_time
> ) VALUES (
> gen_random_uuid(),
> series_id,
> j,
> '(0.0, 0.0, ' || j || ')',
> '(1.0, 0.0, 0.0, 0.0, 1.0, 0.0)',
> 1.0,
> j * 5.0,
> '1.0\\1.0',
> 512,
> 512,
> CURRENT_DATE,
> CURRENT_TIME
> );
> END LOOP;
> END LOOP;
> END $$;
>
> *Add indexes and vacuum analyze:*
>
> CREATE UNIQUE INDEX idx_series_uid ON dicom_series(series_uid);
> CREATE INDEX dicom_sops_100_part_sop_uid_idx ON
> dicom_sops_100_part(sop_uid);
> CREATE INDEX dicom_sops_100_part_series_uid_idx ON
> dicom_sops_100_part(series_uid);
>
> vacuum freeze;
> analyze;
>
> *Testing:*
> disconnect and reconnect to the db with psql.
>
> Query used for test:
>
> drop table temp_series_id;CREATE TEMPORARY TABLE temp_series_id AS select
> series_uid from dicom_series order by random() limit 1; analyze
> temp_series_id;
> explain (analyze,buffers) select * from dicom_sops_100_part where
> series_uid = (select series_uid from temp_series_id);
>
> Query plan:
>
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Append (cost=1.43..423.26 rows=50 width=128) (actual time=2.565..27.216
> rows=20 loops=1)
> Buffers: shared hit=50 read=118, local hit=1
> InitPlan 1
> -> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16)
> (actual time=0.006..0.007 rows=1 loops=1)
> Buffers: local hit=1
> -> Index Scan using dicom_sops_100_p0_series_uid_idx on
> dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128)
> (actual time=0.846..0.846 rows=0 loops=1)
> Index Cond: (series_uid = (InitPlan 1).col1)
> ....
> -> Index Scan using dicom_sops_100_p49_series_uid_idx on
> dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1
> width=128) (actual time=0.302..0.303 rows=0 loops=1)
> Index Cond: (series_uid = (InitPlan 1).col1)
> Buffers: shared hit=1 read=2
> Planning:
> Buffers: shared hit=4180
> Planning Time: 4.941 ms
> Execution Time: 27.682 ms
> (159 rows)
>
> Second query on the same session:
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Append (cost=1.43..423.26 rows=50 width=128) (actual time=9.759..9.770
> rows=0 loops=1)
> Buffers: shared hit=100 read=50, local hit=1
> InitPlan 1
> -> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16)
> (actual time=0.003..0.004 rows=1 loops=1)
> Buffers: local hit=1
> -> Index Scan using dicom_sops_100_p0_series_uid_idx on
> dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128)
> (actual time=0.212..0.213 rows=0 loops=1)
> Index Cond: (series_uid = (InitPlan 1).col1)
> ...
> -> Index Scan using dicom_sops_100_p49_series_uid_idx on
> dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1
> width=128) (actual time=0.236..0.236 rows=0 loops=1)
> Index Cond: (series_uid = (InitPlan 1).col1)
> Buffers: shared hit=2 read=1
> Planning:
> Buffers: shared hit=5
> Planning Time: 0.604 ms
> Execution Time: 10.011 ms
> (159 rows)
>
>
> On Thu, Jan 16, 2025 at 9:56 AM bruno vieira da silva <
> brunogiovs(at)gmail(dot)com> wrote:
>
>> Hello, Thanks David.
>>
>> this pg test deployment. anyways I did a vacuum full on the db. and the
>> number of buffers read increased a bit.
>>
>>
>> On Wed, Jan 15, 2025 at 3:01 PM David Rowley <dgrowleyml(at)gmail(dot)com>
>> wrote:
>>
>>> On Thu, 16 Jan 2025 at 07:29, bruno vieira da silva
>>> <brunogiovs(at)gmail(dot)com> wrote:
>>> > On pg 17 now we have better visibility on the I/O required during
>>> query planning.
>>> > so, as part of an ongoing design work for table partitioning I was
>>> analyzing the performance implications of having more or less partitions.
>>> > In one of my tests of a table with 200 partitions using explain showed
>>> a large amount of buffers read during planning. around 12k buffers.
>>>
>>> That's a suspiciously high number of buffers.
>>>
>>> > I observed that query planning seems to have a caching mechanism as
>>> subsequent similar queries require only a fraction of buffers read during
>>> query planning.
>>> > However, this "caching" seems to be per session as if I end the client
>>> session and I reconnect the same query execution will require again to read
>>> 12k buffer for query planning.
>>> >
>>> > Does pg have any mechanism to mitigate this issue ( new sessions need
>>> to read a large amount of buffers for query planning) ? or should I
>>> mitigate this issue by the use of connection pooling.
>>> > How is this caching done? Is there a way to have viability on its
>>> usage? Where is it stored?
>>>
>>> The caching is for relation meta-data and for various catalogue data.
>>> This is stored in local session hash tables. The caching is done
>>> lazily the first time something is looked up after the session starts.
>>> If you're doing very little work before ending the session, then
>>> you'll pay this overhead much more often than you would if you were to
>>> do more work in each session. A connection pooler would help you do
>>> that, otherwise it would need to be a redesign of how you're
>>> connecting to Postgres from your application.
>>>
>>> There's no easy way from EXPLAIN to see which tables or catalogue
>>> tables the IO is occurring on, however, you might want to try looking
>>> at pg_statio_all_tables directly before and after the query that's
>>> causing the 12k buffer accesses and then look at what's changed.
>>>
>>> I suspect if you're accessing 12k buffers to run EXPLAIN that you have
>>> some auto-vacuum starvation issues. Is auto-vacuum enabled and
>>> running? If you look at pg_stat_activity, do you see autovacuum
>>> running? It's possible that it's running but not configured to run
>>> quickly enough to keep up with demand. Alternatively, it may be
>>> keeping up now, but at some point in the past, it might not have been
>>> and you have some bloat either in an index or in a catalogue table as
>>> a result.
>>>
>>> David
>>>
>>
>>
>> --
>> Bruno Vieira da Silva
>>
>
>
> --
> Bruno Vieira da Silva
>

--
Bruno Vieira da Silva

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bruno vieira da silva 2025-02-17 20:27:44 Re: Query planning read a large amount of buffers for partitioned tables
Previous Message bruno vieira da silva 2025-02-17 20:18:05 Re: Query planning read a large amount of buffers for partitioned tables