From: | Shekar Tippur <ctippur(at)gmail(dot)com> |
---|---|
To: | pgsql-sql(at)lists(dot)postgresql(dot)org |
Subject: | Changing csv structure and corresponding etl |
Date: | 2018-12-13 16:51:59 |
Message-ID: | CED2688C-EC81-4679-8FD8-C64E249099FD@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hello,
I am using redshift to store data from csv backups that appear at a regular interval. I use pyspark (psycopg library) to perform etl. The issue is that the csv structure changes in between and the etl job fails.
The issue I found is that the column gets mixed up.
For example, the original column list was A,B,C,D. In the next iteration, the columns can be A, B, C, X,Y, D
I read from some of the other posts that it is not possible to alter a table to add a column in a particular position within Postgres.
The table itself currently has millions of rows. Merging tables whenever I get a change may not be a good option. I.e. create a union of existing table and new data, drop the original table and rename union to original.
Any pointers in how to proceed?
Thanks,
Shekar
Sent from my iPhone
From | Date | Subject | |
---|---|---|---|
Next Message | Larry Rosenman | 2018-12-15 02:13:59 | GRANT SELECT ON ALL TABLES IN SCHEMA... doesn't apply to new tables? |
Previous Message | Rossi, Maria | 2018-12-13 16:51:02 | RE: Postgres size greater than 1TB |