Re: Query planning read a large amount of buffers for partitioned tables

From: bruno vieira da silva <brunogiovs(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Query planning read a large amount of buffers for partitioned tables
Date: 2025-02-17 20:27:44
Message-ID: CAB+Nuk8vSr9ErhJ8GTffV_ht9pS5TBqQTue9NQsdAXwmTW3O6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

nevermind. The query plan was done on test data with 50 partitions.
Sorry for the confusion.

On Mon, Feb 17, 2025 at 3:25 PM bruno vieira da silva <brunogiovs(at)gmail(dot)com>
wrote:

> Well, the query plans were generated with pg 17.3. and the buffer usage
> was half.
> did pg 17.3 had any fixes to reduce the planning buffer usage?
>
> On Mon, Feb 17, 2025 at 3:18 PM bruno vieira da silva <
> brunogiovs(at)gmail(dot)com> wrote:
>
>> Hello, I did a more comprehensive test with a different number of
>> partitions and I found this:
>>
>> Summary buffers usage for the first call vs second call on the same
>> session.
>>
>> Query 200, 100, 50, and 10 partitions:
>> 200 Partitions: 12,828 (100MB)
>> 100 Partitions: 9,329 (72MB)
>> 50 Partitions: 3,305 (25MB)
>> 10 Partitions: 875 (7MB)
>>
>> Same query on the same session:
>> 200 Partitions: 205 (1.6MB)
>> 100 Partitions: 5 (40KB)
>> 50 Partitions: 5 (40KB)
>> 10 Partitions: 5 (40KB)
>>
>> I did test on PG 17.3 no relevant changes.
>>
>> Question is, does it make sense?
>>
>> *these are the steps to reproduce it:*
>>
>> docker pull postgres:17.2
>> docker run -itd -e POSTGRES_USER=bruno -e POSTGRES_PASSWORD=bruno -p
>> 5500:5432 -v /home/bruno/pgdata17:/var/lib/postgresql/data --name
>> postgresql postgres:17.2
>> export PGHOST="localhost"
>> export PGPORT=5500
>> export PGDATABASE="postgres"
>> export PGUSER="bruno"
>> export PGPASSWORD="bruno"
>>
>> CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- Enables the gen_random_uuid
>> function
>>
>> CREATE TABLE dicom_series (
>> series_uid UUID DEFAULT gen_random_uuid(),
>> series_description VARCHAR(255),
>> modality VARCHAR(16),
>> body_part_examined VARCHAR(64),
>> patient_id VARCHAR(64),
>> study_uid UUID DEFAULT gen_random_uuid(),
>> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
>> );
>>
>> -- Create the parent table
>> CREATE TABLE dicom_sops_100_part (
>> sop_uid UUID NOT NULL,
>> series_uid UUID NOT NULL,
>> instance_number INT,
>> image_position_patient TEXT,
>> image_orientation_patient TEXT,
>> slice_thickness DECIMAL(10, 2),
>> slice_location DECIMAL(10, 2),
>> pixel_spacing TEXT,
>> rows INT,
>> columns INT,
>> acquisition_date DATE,
>> acquisition_time TIME,
>> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
>> ) PARTITION BY HASH (sop_uid);
>>
>> -- Create 100 partitions
>> DO $$
>> DECLARE
>> partition_number INT;
>> BEGIN
>> FOR partition_number IN 0..99 LOOP
>> EXECUTE format(
>> 'CREATE TABLE dicom_sops_100_p%1$s PARTITION OF
>> dicom_sops_100_part FOR VALUES WITH (MODULUS 100, REMAINDER %1$s);',
>> partition_number
>> );
>> END LOOP;
>> END $$;
>>
>> *Data population:*
>>
>> DO $$
>> DECLARE
>> series_count INT := 1000000; -- Number of series to create
>> sops_per_series INT := 20;
>> i INT;
>> j INT;
>> series_id UUID;
>> sop_id UUID;
>> BEGIN
>> FOR i IN 1..series_count LOOP
>> -- Insert into dicom_series table with a generated UUID
>> INSERT INTO dicom_series (
>> series_description,
>> modality,
>> body_part_examined,
>> patient_id
>> ) VALUES (
>> 'Series Description ' || i,
>> 'CT',
>> 'Chest',
>> 'PATIENT-' || i
>> )
>> RETURNING series_uid INTO series_id;
>>
>> FOR j IN 1..sops_per_series LOOP
>> -- Insert into dicom_sops_200_part table with a generated UUID
>> INSERT INTO dicom_sops_100_part (
>> sop_uid,
>> series_uid,
>> instance_number,
>> image_position_patient,
>> image_orientation_patient,
>> slice_thickness,
>> slice_location,
>> pixel_spacing,
>> rows,
>> columns,
>> acquisition_date,
>> acquisition_time
>> ) VALUES (
>> gen_random_uuid(),
>> series_id,
>> j,
>> '(0.0, 0.0, ' || j || ')',
>> '(1.0, 0.0, 0.0, 0.0, 1.0, 0.0)',
>> 1.0,
>> j * 5.0,
>> '1.0\\1.0',
>> 512,
>> 512,
>> CURRENT_DATE,
>> CURRENT_TIME
>> );
>> END LOOP;
>> END LOOP;
>> END $$;
>>
>> *Add indexes and vacuum analyze:*
>>
>> CREATE UNIQUE INDEX idx_series_uid ON dicom_series(series_uid);
>> CREATE INDEX dicom_sops_100_part_sop_uid_idx ON
>> dicom_sops_100_part(sop_uid);
>> CREATE INDEX dicom_sops_100_part_series_uid_idx ON
>> dicom_sops_100_part(series_uid);
>>
>> vacuum freeze;
>> analyze;
>>
>> *Testing:*
>> disconnect and reconnect to the db with psql.
>>
>> Query used for test:
>>
>> drop table temp_series_id;CREATE TEMPORARY TABLE temp_series_id AS select
>> series_uid from dicom_series order by random() limit 1; analyze
>> temp_series_id;
>> explain (analyze,buffers) select * from dicom_sops_100_part where
>> series_uid = (select series_uid from temp_series_id);
>>
>> Query plan:
>>
>>
>> QUERY PLAN
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Append (cost=1.43..423.26 rows=50 width=128) (actual time=2.565..27.216
>> rows=20 loops=1)
>> Buffers: shared hit=50 read=118, local hit=1
>> InitPlan 1
>> -> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16)
>> (actual time=0.006..0.007 rows=1 loops=1)
>> Buffers: local hit=1
>> -> Index Scan using dicom_sops_100_p0_series_uid_idx on
>> dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128)
>> (actual time=0.846..0.846 rows=0 loops=1)
>> Index Cond: (series_uid = (InitPlan 1).col1)
>> ....
>> -> Index Scan using dicom_sops_100_p49_series_uid_idx on
>> dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1
>> width=128) (actual time=0.302..0.303 rows=0 loops=1)
>> Index Cond: (series_uid = (InitPlan 1).col1)
>> Buffers: shared hit=1 read=2
>> Planning:
>> Buffers: shared hit=4180
>> Planning Time: 4.941 ms
>> Execution Time: 27.682 ms
>> (159 rows)
>>
>> Second query on the same session:
>>
>> QUERY PLAN
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Append (cost=1.43..423.26 rows=50 width=128) (actual time=9.759..9.770
>> rows=0 loops=1)
>> Buffers: shared hit=100 read=50, local hit=1
>> InitPlan 1
>> -> Seq Scan on temp_series_id (cost=0.00..1.01 rows=1 width=16)
>> (actual time=0.003..0.004 rows=1 loops=1)
>> Buffers: local hit=1
>> -> Index Scan using dicom_sops_100_p0_series_uid_idx on
>> dicom_sops_100_p0 dicom_sops_100_part_1 (cost=0.42..8.44 rows=1 width=128)
>> (actual time=0.212..0.213 rows=0 loops=1)
>> Index Cond: (series_uid = (InitPlan 1).col1)
>> ...
>> -> Index Scan using dicom_sops_100_p49_series_uid_idx on
>> dicom_sops_100_p49 dicom_sops_100_part_50 (cost=0.42..8.44 rows=1
>> width=128) (actual time=0.236..0.236 rows=0 loops=1)
>> Index Cond: (series_uid = (InitPlan 1).col1)
>> Buffers: shared hit=2 read=1
>> Planning:
>> Buffers: shared hit=5
>> Planning Time: 0.604 ms
>> Execution Time: 10.011 ms
>> (159 rows)
>>
>>
>> On Thu, Jan 16, 2025 at 9:56 AM bruno vieira da silva <
>> brunogiovs(at)gmail(dot)com> wrote:
>>
>>> Hello, Thanks David.
>>>
>>> this pg test deployment. anyways I did a vacuum full on the db. and the
>>> number of buffers read increased a bit.
>>>
>>>
>>> On Wed, Jan 15, 2025 at 3:01 PM David Rowley <dgrowleyml(at)gmail(dot)com>
>>> wrote:
>>>
>>>> On Thu, 16 Jan 2025 at 07:29, bruno vieira da silva
>>>> <brunogiovs(at)gmail(dot)com> wrote:
>>>> > On pg 17 now we have better visibility on the I/O required during
>>>> query planning.
>>>> > so, as part of an ongoing design work for table partitioning I was
>>>> analyzing the performance implications of having more or less partitions.
>>>> > In one of my tests of a table with 200 partitions using explain
>>>> showed a large amount of buffers read during planning. around 12k buffers.
>>>>
>>>> That's a suspiciously high number of buffers.
>>>>
>>>> > I observed that query planning seems to have a caching mechanism as
>>>> subsequent similar queries require only a fraction of buffers read during
>>>> query planning.
>>>> > However, this "caching" seems to be per session as if I end the
>>>> client session and I reconnect the same query execution will require again
>>>> to read 12k buffer for query planning.
>>>> >
>>>> > Does pg have any mechanism to mitigate this issue ( new sessions need
>>>> to read a large amount of buffers for query planning) ? or should I
>>>> mitigate this issue by the use of connection pooling.
>>>> > How is this caching done? Is there a way to have viability on its
>>>> usage? Where is it stored?
>>>>
>>>> The caching is for relation meta-data and for various catalogue data.
>>>> This is stored in local session hash tables. The caching is done
>>>> lazily the first time something is looked up after the session starts.
>>>> If you're doing very little work before ending the session, then
>>>> you'll pay this overhead much more often than you would if you were to
>>>> do more work in each session. A connection pooler would help you do
>>>> that, otherwise it would need to be a redesign of how you're
>>>> connecting to Postgres from your application.
>>>>
>>>> There's no easy way from EXPLAIN to see which tables or catalogue
>>>> tables the IO is occurring on, however, you might want to try looking
>>>> at pg_statio_all_tables directly before and after the query that's
>>>> causing the 12k buffer accesses and then look at what's changed.
>>>>
>>>> I suspect if you're accessing 12k buffers to run EXPLAIN that you have
>>>> some auto-vacuum starvation issues. Is auto-vacuum enabled and
>>>> running? If you look at pg_stat_activity, do you see autovacuum
>>>> running? It's possible that it's running but not configured to run
>>>> quickly enough to keep up with demand. Alternatively, it may be
>>>> keeping up now, but at some point in the past, it might not have been
>>>> and you have some bloat either in an index or in a catalogue table as
>>>> a result.
>>>>
>>>> David
>>>>
>>>
>>>
>>> --
>>> Bruno Vieira da Silva
>>>
>>
>>
>> --
>> Bruno Vieira da Silva
>>
>
>
> --
> Bruno Vieira da Silva
>

--
Bruno Vieira da Silva

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2025-02-18 12:16:09 Re: Query planning read a large amount of buffers for partitioned tables
Previous Message bruno vieira da silva 2025-02-17 20:25:37 Re: Query planning read a large amount of buffers for partitioned tables