From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Subject: | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
Date: | 2023-12-11 14:05:33 |
Message-ID: | f03bc3de-9769-4106-80cf-158c5f184a1e@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi! Thank you for your work. Your patch looks better!
On 10.12.2023 13:32, jian he wrote:
> On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina<lena(dot)ribackina(at)yandex(dot)ru> wrote:
>> Thank you for your work. Unfortunately, your code contained errors during the make installation:
>>
>> 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
>> 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
>> make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
>> make[1]: *** [Makefile:131: parser/gram.h] Error 2
>> make[1]: *** Waiting for unfinished jobs....
>> make: *** [src/Makefile.global:383: submake-generated-headers] Error 2
>>
>> I have ubuntu 22.04 operation system.
>>
>> On 06.12.2023 13:47, jian he wrote:
>>
>> On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina<lena(dot)ribackina(at)yandex(dot)ru> wrote:
>>
>> Hi!
>>
>> Thank you for your contribution to this thread.
>>
>>
>> I reviewed it and have a few questions.
>>
>> 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
>> At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.
>>
>> Sorry. I don't understand this part.
>> Currently, if the error table name already exists, then the copy will
>> fail, an error will be reported.
>> I try to first create a table, if no error then the error table will be dropped.
>>
>> To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned.
>>
>> However, I didn't see if you dealt with cases where we already had a table with the same name as the table error.
>> I mean, when is he trying to create for the first time, or will we never be able to face such a problem?
>>
>> Can you demo the expected behavior?
>>
>> Unfortunately, I was unable to launch it due to a build issue.
>>
> Hopefully attached will work.
Yes, thank you! It works fine, and I see that the regression tests have
been passed. 🙂
However, when I ran 'copy from with save_error' operation with simple
csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I
created it, I described below):
postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
FOREIGN KEY(x)
REFERENCES test(x));
I did not find a table with saved errors after operation, although I
received a log about it:
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to
table public.test_error
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (x)=(2) already exists.
CONTEXT: COPY test, line 3
postgres=# select * from public.test_error;
ERROR: relation "public.test_error" does not exist
LINE 1: select * from public.test_error;
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ','
CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to
table public.test1_error
ERROR: insert or update on table "test1" violates foreign key
constraint "fk_x"
DETAIL: Key (x)=(2) is not present in table "test".
postgres=# select * from public.test1_error;
ERROR: relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;
Two lines were written correctly in the csv files, therefore they should
have been added to the tables, but they were not added to the tables
test and test1.
If I leave only the correct rows, everything works fine and the rows are
added to the tables.
in copy_test.csv:
2,0
1,1
in copy_test1.csv:
2,0
2,1
1,1
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ','
CSV save_error
NOTICE: No error happened.Error holding table public.test1_error will
be droped
COPY 3
Maybe I'm launching it the wrong way. If so, let me know about it.
I also notice interesting behavior if the table was previously created
by the user. When I was creating an error_table before the 'copy from'
operation,
I received a message saying that it is impossible to create a table with
the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation,
since this seems to be normal behavior to me.
postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
CREATE TABLE
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
save_error
ERROR: Error save table public.test_error already exists. Cannot use it
for COPY FROM error saving
>
>> 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
>> I think it would be better to use the name of the source file that was used while 'copy from' was running.
>> In addition, there may be several such files, it is also worth considering.
>>
>> Another column added.
>> now it looks like:
>>
>> SELECT * FROM save_error_csv_error;
>> filename | lineno | line
>> | field | source | err_message |
>> err_detail | errorcode
>> ----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
>> STDIN | 1 | 2002 232 40 50 60 70
>> 80 | NULL | NULL | extra data after last expected column |
>> NULL | 22P04
>> STDIN | 1 | 2000 230 23
>> | d | NULL | missing data for column "d" | NULL
>> | 22P04
>> STDIN | 1 | z,,""
>> | a | z | invalid input syntax for type integer: "z" | NULL
>> | 22P02
>> STDIN | 2 | \0,,
>> | a | \0 | invalid input syntax for type integer: "\0" | NULL
>> | 22P02
>>
>> Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me.
> please see comment in struct CopyFromStateData:
> char *filename; /* filename, or NULL for STDIN */
>
Yes, I can see that.
I haven't figured out how to fix it yet either.
>> */
>>
>> Maybe we can rewrite it like this:
>>
>> /* Check, the err_nsp.error_rel table has already existed
>> * and if it is, check its column name and data types.
>>
> refactored.
Fine)
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
copy_test1.csv | text/csv | 26 bytes |
copy_test.csv | text/csv | 23 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-12-11 14:24:25 | Re: [CAUTION!! freemail] Re: Partial aggregates pushdown |
Previous Message | Amit Kapila | 2023-12-11 13:42:39 | Re: Synchronizing slots from primary to standby |