Re: Query planning read a large amount of buffers for partitioned tables

From: bruno vieira da silva <brunogiovs(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Query planning read a large amount of buffers for partitioned tables
Date: 2025-02-17 20:18:05
Message-ID: CAB+Nuk-JOJD3QA4=cTnpqs-AWQrdSWZWH8Ydf1E-z5AWNRUh7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello, I did a more comprehensive test with a different number of
partitions and I found this:

Summary buffers usage for the first call vs second call on the same session.

Query 200, 100, 50, and 10 partitions:
200 Partitions: 12,828 (100MB)
100 Partitions: 9,329 (72MB)
50 Partitions: 3,305 (25MB)
10 Partitions: 875 (7MB)

Same query on the same session:
200 Partitions: 205 (1.6MB)
100 Partitions: 5 (40KB)
50 Partitions: 5 (40KB)
10 Partitions: 5 (40KB)

I did test on PG 17.3 no relevant changes.

Question is, does it make sense?

*these are the steps to reproduce it:*

docker pull postgres:17.2
docker run -itd -e POSTGRES_USER=bruno -e POSTGRES_PASSWORD=bruno -p
5500:5432 -v /home/bruno/pgdata17:/var/lib/postgresql/data --name
postgresql postgres:17.2
export PGHOST="localhost"
export PGPORT=5500
export PGDATABASE="postgres"
export PGUSER="bruno"
export PGPASSWORD="bruno"

CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- Enables the gen_random_uuid
function

CREATE TABLE dicom_series (
series_uid UUID DEFAULT gen_random_uuid(),
series_description VARCHAR(255),
modality VARCHAR(16),
body_part_examined VARCHAR(64),
patient_id VARCHAR(64),
study_uid UUID DEFAULT gen_random_uuid(),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create the parent table
CREATE TABLE dicom_sops_100_part (
sop_uid UUID NOT NULL,
series_uid UUID NOT NULL,
instance_number INT,
image_position_patient TEXT,
image_orientation_patient TEXT,
slice_thickness DECIMAL(10, 2),
slice_location DECIMAL(10, 2),
pixel_spacing TEXT,
rows INT,
columns INT,
acquisition_date DATE,
acquisition_time TIME,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) PARTITION BY HASH (sop_uid);

-- Create 100 partitions
DO $$
DECLARE
partition_number INT;
BEGIN
FOR partition_number IN 0..99 LOOP
EXECUTE format(
'CREATE TABLE dicom_sops_100_p%1$s PARTITION OF
dicom_sops_100_part FOR VALUES WITH (MODULUS 100, REMAINDER %1$s);',
partition_number
);
END LOOP;
END $$;

*Data population:*

DO $$
DECLARE
series_count INT := 1000000; -- Number of series to create
sops_per_series INT := 20;
i INT;
j INT;
series_id UUID;
sop_id UUID;
BEGIN
FOR i IN 1..series_count LOOP
-- Insert into dicom_series table with a generated UUID
INSERT INTO dicom_series (
series_description,
modality,
body_part_examined,
patient_id
) VALUES (
'Series Description ' || i,
'CT',
'Chest',
'PATIENT-' || i
)
RETURNING series_uid INTO series_id;

FOR j IN 1..sops_per_series LOOP
-- Insert into dicom_sops_200_part table with a generated UUID
INSERT INTO dicom_sops_100_part (
sop_uid,
series_uid,
instance_number,
image_position_patient,
image_orientation_patient,
slice_thickness,
slice_location,
pixel_spacing,
rows,
columns,
acquisition_date,
acquisition_time
) VALUES (
gen_random_uuid(),
series_id,
j,
'(0.0, 0.0, ' || j || ')',
'(1.0, 0.0, 0.0, 0.0, 1.0, 0.0)',
1.0,
j * 5.0,
'1.0\\1.0',
512,
512,
CURRENT_DATE,
CURRENT_TIME
);
END LOOP;
END LOOP;
END $$;

*Add indexes and vacuum analyze:*

CREATE UNIQUE INDEX idx_series_uid ON dicom_series(series_uid);
CREATE INDEX dicom_sops_100_part_sop_uid_idx ON
dicom_sops_100_part(sop_uid);
CREATE INDEX dicom_sops_100_part_series_uid_idx ON
dicom_sops_100_part(series_uid);

vacuum freeze;
analyze;

*Testing:*
disconnect and reconnect to the db with psql.

Query used for test:

drop table temp_series_id;CREATE TEMPORARY TABLE temp_series_id AS select
series_uid from dicom_series order by random() limit 1; analyze
temp_series_id;
explain (analyze,buffers) select * from dicom_sops_100_part where
series_uid = (select series_uid from temp_series_id);

Query plan:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=1.43..423.26 rows=50 width=128) (actual time=2.565..27.216
rows=20 loops=1)
Buffers: shared hit=50 read=118, local hit=1
InitPlan 1
-> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16)
(actual time=0.006..0.007 rows=1 loops=1)
Buffers: local hit=1
-> Index Scan using dicom_sops_100_p0_series_uid_idx on
dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128)
(actual time=0.846..0.846 rows=0 loops=1)
Index Cond: (series_uid = (InitPlan 1).col1)
....
-> Index Scan using dicom_sops_100_p49_series_uid_idx on
dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1
width=128) (actual time=0.302..0.303 rows=0 loops=1)
Index Cond: (series_uid = (InitPlan 1).col1)
Buffers: shared hit=1 read=2
Planning:
Buffers: shared hit=4180
Planning Time: 4.941 ms
Execution Time: 27.682 ms
(159 rows)

Second query on the same session:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=1.43..423.26 rows=50 width=128) (actual time=9.759..9.770
rows=0 loops=1)
Buffers: shared hit=100 read=50, local hit=1
InitPlan 1
-> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16)
(actual time=0.003..0.004 rows=1 loops=1)
Buffers: local hit=1
-> Index Scan using dicom_sops_100_p0_series_uid_idx on
dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128)
(actual time=0.212..0.213 rows=0 loops=1)
Index Cond: (series_uid = (InitPlan 1).col1)
...
-> Index Scan using dicom_sops_100_p49_series_uid_idx on
dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1
width=128) (actual time=0.236..0.236 rows=0 loops=1)
Index Cond: (series_uid = (InitPlan 1).col1)
Buffers: shared hit=2 read=1
Planning:
Buffers: shared hit=5
Planning Time: 0.604 ms
Execution Time: 10.011 ms
(159 rows)

On Thu, Jan 16, 2025 at 9:56 AM bruno vieira da silva <brunogiovs(at)gmail(dot)com>
wrote:

> Hello, Thanks David.
>
> this pg test deployment. anyways I did a vacuum full on the db. and the
> number of buffers read increased a bit.
>
>
> On Wed, Jan 15, 2025 at 3:01 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
>> On Thu, 16 Jan 2025 at 07:29, bruno vieira da silva
>> <brunogiovs(at)gmail(dot)com> wrote:
>> > On pg 17 now we have better visibility on the I/O required during query
>> planning.
>> > so, as part of an ongoing design work for table partitioning I was
>> analyzing the performance implications of having more or less partitions.
>> > In one of my tests of a table with 200 partitions using explain showed
>> a large amount of buffers read during planning. around 12k buffers.
>>
>> That's a suspiciously high number of buffers.
>>
>> > I observed that query planning seems to have a caching mechanism as
>> subsequent similar queries require only a fraction of buffers read during
>> query planning.
>> > However, this "caching" seems to be per session as if I end the client
>> session and I reconnect the same query execution will require again to read
>> 12k buffer for query planning.
>> >
>> > Does pg have any mechanism to mitigate this issue ( new sessions need
>> to read a large amount of buffers for query planning) ? or should I
>> mitigate this issue by the use of connection pooling.
>> > How is this caching done? Is there a way to have viability on its
>> usage? Where is it stored?
>>
>> The caching is for relation meta-data and for various catalogue data.
>> This is stored in local session hash tables. The caching is done
>> lazily the first time something is looked up after the session starts.
>> If you're doing very little work before ending the session, then
>> you'll pay this overhead much more often than you would if you were to
>> do more work in each session. A connection pooler would help you do
>> that, otherwise it would need to be a redesign of how you're
>> connecting to Postgres from your application.
>>
>> There's no easy way from EXPLAIN to see which tables or catalogue
>> tables the IO is occurring on, however, you might want to try looking
>> at pg_statio_all_tables directly before and after the query that's
>> causing the 12k buffer accesses and then look at what's changed.
>>
>> I suspect if you're accessing 12k buffers to run EXPLAIN that you have
>> some auto-vacuum starvation issues. Is auto-vacuum enabled and
>> running? If you look at pg_stat_activity, do you see autovacuum
>> running? It's possible that it's running but not configured to run
>> quickly enough to keep up with demand. Alternatively, it may be
>> keeping up now, but at some point in the past, it might not have been
>> and you have some bloat either in an index or in a catalogue table as
>> a result.
>>
>> David
>>
>
>
> --
> Bruno Vieira da Silva
>

--
Bruno Vieira da Silva

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bruno vieira da silva 2025-02-17 20:25:37 Re: Query planning read a large amount of buffers for partitioned tables
Previous Message Pavel Stehule 2025-02-16 09:13:38 Re: Re: proposal: schema variables