Changing csv structure and corresponding etl

From: Shekar Tippur <ctippur(at)gmail(dot)com>
To: pgsql-sql(at)lists(dot)postgresql(dot)org
Subject: Changing csv structure and corresponding etl
Date: 2018-12-13 16:51:59
Message-ID: CED2688C-EC81-4679-8FD8-C64E249099FD@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello,

I am using redshift to store data from csv backups that appear at a regular interval. I use pyspark (psycopg library) to perform etl. The issue is that the csv structure changes in between and the etl job fails.
The issue I found is that the column gets mixed up.
For example, the original column list was A,B,C,D. In the next iteration, the columns can be A, B, C, X,Y, D
I read from some of the other posts that it is not possible to alter a table to add a column in a particular position within Postgres.
The table itself currently has millions of rows. Merging tables whenever I get a change may not be a good option. I.e. create a union of existing table and new data, drop the original table and rename union to original.
Any pointers in how to proceed?

Thanks,
Shekar
Sent from my iPhone

Browse pgsql-sql by date

  From Date Subject
Next Message Larry Rosenman 2018-12-15 02:13:59 GRANT SELECT ON ALL TABLES IN SCHEMA... doesn't apply to new tables?
Previous Message Rossi, Maria 2018-12-13 16:51:02 RE: Postgres size greater than 1TB