Quick Links

reindex option for tuning load large data

From:	"James Pang (chaolpan)" <chaolpan(at)cisco(dot)com>
To:	"pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	reindex option for tuning load large data
Date:	2022-06-17 05:34:26
Message-ID:	PH0PR11MB51911D09840DC7BD6069678ED6AF9@PH0PR11MB5191.namprd11.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi ,

We plan to migrate large database from Oracle to Postgres(version 13.6, OS Redhat8 Enterprise), we are checking options to make data load in Postgres fast. Data volume is about several TB, thousands of indexes, many large table with partitions. We want to make data load running fast and avoid miss any indexes when reindexing. There are 2 options about reindex. Could you give some suggestions about the 2 options, which option is better.

1. Create tables and indexes( empty database) , update pg_index set indisready=false and inisvalid=false, then load data use COPY from csv , then reindex table ...

Reindex on Postgres 13.6 not support parallel ,right? So we need to start multiple session to reindex multiple tables/indexes in parallel.

2). Use pg_dump to dump meta data only , then copy "CREATE INDEX ... sql "
Drop indexes before data load
After data load, increase max_parallel_maintenance_workers, maintenance_work_mem

Run CREATE INDEX ... sql to leverage parallel create index feature.

Thanks,

James

Responses

Re: reindex option for tuning load large data at 2022-06-18 03:48:35 from Vitalii Tymchyshyn
Re: reindex option for tuning load large data at 2022-06-18 20:01:00 from Jeff Janes

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Vitalii Tymchyshyn	2022-06-18 03:48:35	Re: reindex option for tuning load large data
Previous Message	David G. Johnston	2022-06-12 23:17:14	Re: Missed query planner optimization: `n in (select q)` -> `n in (q)`