From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Subject: | Re: [PATCH] Initial progress reporting for COPY command |
Date: | 2020-06-23 12:52:01 |
Message-ID: | CALj2ACVN18+z-RS1yKSE8ewD2dFMKpiLMN9HjvSQ093jJxBYBQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> po 15. 6. 2020 v 7:34 odesílatel Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> napsal:
>>
>> > I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).
>> >
>> > 1. Is that a good way to get progress of file processing?
>>
>> IMO, it's better to handle the error cases. One possible case where
>> ftell can return -1 and set errno is when the total bytes processed is
>> more than LONG_MAX.
>>
>> Will your patch handle file_bytes_processed reporting for COPY FROM
>> STDIN cases? For this case, ftell can't be used.
>>
>> Instead of using ftell and worrying about the errors, a simple
>> approach could be to have a uint64 variable in CopyStateData to track
>> the number of bytes read whenever CopyGetData is called. This approach
>> can also handle the case of COPY FROM STDIN.
>
>
> Thanks for suggestion. I used this approach and latest patch supports both STDIN and STDOUT now.
>
Thanks.
It would be good to see the performance of the copy command(probably
with a few GBs of data) with patch and without patch for both csv/text
and binary files.
For copy from command CopyGetData gets called for every
RAW_BUF_SIZE(64KB) and so is CopyUpdateBytesProgress function, but for
binary format files, CopyGetData gets called for each field/column for
all rows/lines/tuples.
Can we make CopyUpdateBytesProgress() a macro or an inline
function(probably by using pg_attribute_always_inline) to reduce
function call overhead as it just handles two statements?
I tried to apply the patch on commit #
7ce461560159948ba0c802c767e42c5f5ae08b4a, seems like a warning.
bharath:postgres$ git apply /mnt/hgfs/Downloads/copy-progress-v2.diff
/mnt/hgfs/Downloads/copy-progress-v2.diff:277: trailing whitespace.
* for counting tuples inserted by an INSERT
command. Update
warning: 1 line adds whitespace errors.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2020-06-23 12:57:23 | some more pg_dump refactoring |
Previous Message | Ranier Vilela | 2020-06-23 12:31:51 | [PATCH] fix size sum table_parallelscan_estimate (src/backend/access/table/tableam.c) |