Quick Links

Changing csv structure and corresponding etl

From:	Shekar Tippur <ctippur(at)gmail(dot)com>
To:	pgsql-sql(at)lists(dot)postgresql(dot)org
Subject:	Changing csv structure and corresponding etl
Date:	2018-12-13 16:51:59
Message-ID:	CED2688C-EC81-4679-8FD8-C64E249099FD@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Hello,

I am using redshift to store data from csv backups that appear at a regular interval. I use pyspark (psycopg library) to perform etl. The issue is that the csv structure changes in between and the etl job fails.
The issue I found is that the column gets mixed up.
For example, the original column list was A,B,C,D. In the next iteration, the columns can be A, B, C, X,Y, D
I read from some of the other posts that it is not possible to alter a table to add a column in a particular position within Postgres.
The table itself currently has millions of rows. Merging tables whenever I get a change may not be a good option. I.e. create a union of existing table and new data, drop the original table and rename union to original.
Any pointers in how to proceed?

Thanks,
Shekar
Sent from my iPhone

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Larry Rosenman	2018-12-15 02:13:59	GRANT SELECT ON ALL TABLES IN SCHEMA... doesn't apply to new tables?
Previous Message	Rossi, Maria	2018-12-13 16:51:02	RE: Postgres size greater than 1TB