RE: reindex option for tuning load large data

From: "James Pang (chaolpan)" <chaolpan(at)cisco(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: RE: reindex option for tuning load large data
Date: 2022-06-19 07:16:26
Message-ID: PH0PR11MB51917D31917C184EBE26B008D6B19@PH0PR11MB5191.namprd11.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We extracted data from Oracle to csv first, already convert schema objects from Oracle to Postgresql too. Then use COPY from csv to Postgres.
The point is about the 2 options to how to make the data load fast, pg_dump only used to dump metadata in Postgres to rebuild index and recreate constraints.
The questions is instead of drop index and create index, we check update pg_index set indisready=false and reindex again after load.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Sent: Sunday, June 19, 2022 4:01 AM
To: James Pang (chaolpan) <chaolpan(at)cisco(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: reindex option for tuning load large data

On Fri, Jun 17, 2022 at 1:34 AM James Pang (chaolpan) <chaolpan(at)cisco(dot)com<mailto:chaolpan(at)cisco(dot)com>> wrote:
Hi ,

We plan to migrate large database from Oracle to Postgres(version 13.6, OS Redhat8 Enterprise), we are checking options to make data load in Postgres fast. Data volume is about several TB, thousands of indexes, many large table with partitions. We want to make data load running fast and avoid miss any indexes when reindexing. There are 2 options about reindex. Could you give some suggestions about the 2 options, which option is better.

1. Create tables and indexes( empty database) , update pg_index set indisready=false and inisvalid=false, then load data use COPY from csv , then reindex table …

Where did this idea come from? This is likely to destroy your database.

2). Use pg_dump to dump meta data only , then copy “CREATE INDEX … sql “
Drop indexes before data load
After data load, increase max_parallel_maintenance_workers, maintenance_work_mem

Run CREATE INDEX … sql to leverage parallel create index feature.

pg_dump doesn't run against Oracle, so where is the thing you are running pg_dump against coming from?

If you already have a fleshed out schema in PostgreSQL, you should dump the sections separately (with --section=pre-data and --section=post-data) to get the commands to build the objects which should be run before and after the data is loaded.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message I. V. 2022-06-19 08:46:53 Re: reindex option for tuning load large data
Previous Message Jeff Janes 2022-06-18 20:01:00 Re: reindex option for tuning load large data