From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Subject: | Re: [PATCH] Initial progress reporting for COPY command |
Date: | 2020-06-23 11:15:19 |
Message-ID: | 20200623111519.dtkknc5xoxicszq7@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
>On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> wrote:
>>
>> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>>
>> For now I have tested those cases:
>>
>> CREATE TABLE test(id int);
>> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
>> COPY (SELECT * FROM test) TO '/tmp/ids';
>> COPY test FROM '/tmp/ids';
>>
>> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
>> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>>
>> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
>> I'll try to check more complex COPY commands to ensure everything is in sync.
>>
>> If you have any ideas for testing queries, feel free to suggest.
>
>For copy from statement you could attach the session, put a breakpoint
>at CopyReadLineText, execution will hit this breakpoint for every
>record it is doing COPY FROM and parallely check if
>pg_stat_progress_copy is getting updated correctly. I noticed it was
>showing the file read size instead of the actual processed bytes.
>
>>> +pg_stat_progress_copy| SELECT s.pid,
>>> + s.datid,
>>> + d.datname,
>>> + s.relid,
>>> + CASE s.param1
>>> + WHEN 0 THEN 'TO'::text
>>> + WHEN 1 THEN 'FROM'::text
>>> + ELSE NULL::text
>>> + END AS direction,
>>> + ((s.param2)::integer)::boolean AS file,
>>> + ((s.param3)::integer)::boolean AS program,
>>> + s.param4 AS lines_processed,
>>> + s.param5 AS file_bytes_processed
>>>
>>> You could include pg_size_pretty for s.param5 like
>>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>>> users to understand bytes_processed when the data size increases.
>>
>>
>> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
>> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>>
>> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
>
>I felt we could add pg_size_pretty to make the view more user friendly.
>
Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Surafel Temesgen | 2020-06-23 11:59:45 | Re: Decomposing xml into table |
Previous Message | Ranier Vilela | 2020-06-23 11:05:03 | Re: Parallel Seq Scan vs kernel read ahead |