Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-08 07:09:07
Message-ID: 23a89ba0-b5d5-419c-b798-cd2c59490148@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your work. Unfortunately, your code contained errors
during the make installation:

'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:
> On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina<lena(dot)ribackina(at)yandex(dot)ru> wrote:
>> Hi!
>>
>> Thank you for your contribution to this thread.
>>
>>
>> I reviewed it and have a few questions.
>>
>> 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
>> At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.
> Sorry. I don't understand this part.
> Currently, if the error table name already exists, then the copy will
> fail, an error will be reported.
> I try to first create a table, if no error then the error table will be dropped.
To be honest, first of all, I misunderstood this part of the code. Now I
see that it works the way you mentioned.

However, I didn't see if you dealt with cases where we already had a
table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never
be able to face such a problem?
> Can you demo the expected behavior?
Unfortunately, I was unable to launch it due to a build issue.
>
>> 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
>> I think it would be better to use the name of the source file that was used while 'copy from' was running.
>> In addition, there may be several such files, it is also worth considering.
>>
> Another column added.
> now it looks like:
>
> SELECT * FROM save_error_csv_error;
> filename | lineno | line
> | field | source | err_message |
> err_detail | errorcode
> ----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
> STDIN | 1 | 2002 232 40 50 60 70
> 80 | NULL | NULL | extra data after last expected column |
> NULL | 22P04
> STDIN | 1 | 2000 230 23
> | d | NULL | missing data for column "d" | NULL
> | 22P04
> STDIN | 1 | z,,""
> | a | z | invalid input syntax for type integer: "z" | NULL
> | 22P02
> STDIN | 2 | \0,,
> | a | \0 | invalid input syntax for type integer: "\0" | NULL
> | 22P02
>
Yes, I see the "filename" column, and this will solve the problem, but
"STDIN" is unclear to me.
>> 3. I found spelling:
>>
>> /* no err_nsp.error_rel table then crete one. for holding error. */
>>
> fixed.
>
>> 4. Maybe rewrite this comment
>>
>> these info need, no error will drop err_nsp.error_rel table
>> to:
>> this information is necessary, no error will lead to the deletion of the err_sp.error_rel table.
>>
> fixed.
Thank you.
>> 5. Is this part of the comment needed? I think it duplicates the information below when we form the query.
>>
>> * . column list(order by attnum, begin from ctid) =
>> * {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
>> * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}
>>
>> I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear.
>>
> Simplified the comment. "order by attnum" is to make sure that if
> there is a table already existing, and the column name is like X and
> the data type like Y, then we consider this table is good for holding
> potential error info.
>
> COPY FROM, main entry point is NextCopyFrom.
> Now for non-binary mode, if you specified save_error then it will not
> fail at NextCopyFrom.
> all these three errors will be tolerated: extra data after last
> expected column, missing data for column, data type conversion.
It looks clearer and better, thanks!

Comments in the format of questions are unusual for me, I perceive them
to think about it, for example, as here (contrib/bloom/blinsert.c:312):

/*

 * Didn't find place to insert in notFullPage array.  Allocate new page.
 * (XXX is it good to do this while holding ex-lock on the metapage??)
 */

Maybe we can rewrite it like this:

/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2023-12-08 07:13:36 Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
Previous Message Michael Paquier 2023-12-08 07:02:55 Re: Make COPY format extendable: Extract COPY TO format implementations