From: | newtglobal postgresql_contributors <postgresql_contributors(at)newtglobalcorp(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Jian He <hejian(dot)mark(at)gmail(dot)com> |
Subject: | Re: speedup COPY TO for partitioned table. |
Date: | 2025-03-17 08:02:09 |
Message-ID: | 174219852967.294107.6195385625494034792.pgcf@coridan.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance over the older COPY (SELECT * FROM sales) TO STDOUT; ~ 3.80ms method. This eliminates query planning overhead and significantly speeds up data export from partitioned tables.
Our test setup involved creating a partitioned table(sales), inserted 500 records, and comparing execution times.
-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
id SERIAL NOT NULL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
category TEXT NOT NULL,
PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);
-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
PARTITION BY HASH (region);
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (region);
-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 1);
-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 1);
-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT
('2023-01-01'::DATE + (random() * 730)::int) AS sale_date, -- Random date in 2023-2024 range
CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region, -- Random region
(random() * 1000)::NUMERIC(10,2) AS amount, -- Random amount (0 to 1000)
CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS category -- Random category
FROM generate_series(1, 500);
COPY (SELECT * FROM SALES) TO STDOUT; ~ 1.909ms
COPY SALES TO STDOUT; ~ 3.80ms
This change is recommended for better performance in PostgreSQL partitioned tables.
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2025-03-17 08:11:58 | Re: pg_stat_statements and "IN" conditions |
Previous Message | Daniil Davydov | 2025-03-17 08:00:45 | Re: Forbid to DROP temp tables of other sessions |