From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Simple progress reporting for COPY command |
Date: | 2021-01-08 13:30:22 |
Message-ID: | CAEze2Wj62YGOK_d67LvfGoL=ZobfmUhPn+WRGfEhMtGHBaM1Xg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 7 Jan 2021 at 23:00, Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> wrote:
>
> čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> napsal:
> >
> > I'm not particularly attached to the "lines" naming, it just seemed OK
> > to me. So if there's consensus to rename this somehow, I'm OK with it.
>
> The problem I do see here is it depends on the "way" of COPY. If
> you're copying from CSV file to table, those are actually lines (since
> 1 line = 1 tuple). But copying from DB to file is copying tuples (but
> 1 tuple = 1 file line). Line works better here for me personally.
>
> Once I'll fix the problem with triggers (and also another cases if
> found), I think we can consider it lines. It will represent amount of
> lines processed from file on COPY FROM and amount of lines written to
> file in COPY TO form (at least in CSV format). I'm not sure how BINARY
> format works, I'll check.
Counterexample that 1 tuple need not be 1 line, in csv/binary:
/*
* create a table with one tuple containing 1 text field, which consists of
* 10 newline characters.
* If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n).
*/
# CREATE TABLE ttab (val) AS
SELECT * FROM (values (
repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text
)) as v;
# -- indeed, one unix-style line, according to $ wc -l copy.txt
# COPY ttab TO 'copy.txt' (format text);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text);
COPY 1
# -- 11 lines
# COPY ttab TO 'copy.csv' (format csv);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv);
COPY 1
# -- 13 lines
# COPY ttab TO 'copy.bin' (format binary);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary);
COPY 1
All of the above copy statements would only report 'lines_processed = 1',
in the progress reporting, while csv/binary line counts are definatively
inconsistent with what the progress reporting shows, because progress
reporting counts tuples / table rows, not the amount of lines in the
external file.
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2021-01-08 13:38:42 | Re: WIP: System Versioned Temporal Table |
Previous Message | Masahiro Ikeda | 2021-01-08 12:44:59 | Re: Add session statistics to pg_stat_database |