Re: speedup COPY TO for partitioned table.

From: newtglobal postgresql_contributors <postgresql_contributors(at)newtglobalcorp(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Jian He <hejian(dot)mark(at)gmail(dot)com>
Subject: Re: speedup COPY TO for partitioned table.
Date: 2025-03-17 08:02:09
Message-ID: 174219852967.294107.6195385625494034792.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance over the older COPY (SELECT * FROM sales) TO STDOUT; ~ 3.80ms method. This eliminates query planning overhead and significantly speeds up data export from partitioned tables.
Our test setup involved creating a partitioned table(sales), inserted 500 records, and comparing execution times.

-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
id SERIAL NOT NULL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
category TEXT NOT NULL,
PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);

-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
PARTITION BY HASH (region);

CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (region);

-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT
('2023-01-01'::DATE + (random() * 730)::int) AS sale_date, -- Random date in 2023-2024 range
CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region, -- Random region
(random() * 1000)::NUMERIC(10,2) AS amount, -- Random amount (0 to 1000)
CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS category -- Random category
FROM generate_series(1, 500);

COPY (SELECT * FROM SALES) TO STDOUT; ~ 1.909ms

COPY SALES TO STDOUT; ~ 3.80ms

This change is recommended for better performance in PostgreSQL partitioned tables.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2025-03-17 08:11:58 Re: pg_stat_statements and "IN" conditions
Previous Message Daniil Davydov 2025-03-17 08:00:45 Re: Forbid to DROP temp tables of other sessions