Re: Add on_error and log_verbosity options to file_fdw

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add on_error and log_verbosity options to file_fdw
Date: 2024-07-22 22:07:46
Message-ID: CAD21AoBij_2cy5T7ihk9ArRQzDcXVHoK13vzy34zAUAMLgrKkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Jul 18, 2024 at 6:38 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2024-07-05 00:27, torikoshia wrote:
> > Hi,
> >
> > With the current file_fdw, if even one line of data conversion fails,
> > the contents of the file cannot be referenced at all:
> >
> > =# \! cat data/test.data
> > 1,a
> > 2,b
> > a,c
> > =# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv');
> > CREATE FOREIGN TABLE
> >
> > =# table f_fdw_test_1;
> > ERROR: invalid input syntax for type integer: "a"
> > CONTEXT: COPY f_fdw_test, line 3, column i: "a"
> >
> > Since we'll support ON_ERROR option which tolerates data conversion
> > errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about
> > supporting them on file_fdw?

+1

> >
> > This idea comes from Fujii-san[2], and I think it'd be useful when
> > reading a bit dirty data.
> >
> > Attached PoC patch works like below:
> >
> > =# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv', on_error 'ignore');
> > CREATE FOREIGN TABLE
> >
> > =# table f_fdw_test_2;
> > NOTICE: 1 row was skipped due to data type incompatibility
> > i | t
> > ---+---
> > 1 | a
> > 2 | b
> > (2 rows)

I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent. An alternative idea of place to write the
information of the number of malformed rows would be the EXPLAIN
command as follow:

QUERY PLAN
----------------------------------------------------------------
Foreign Scan on public.test (cost=0.00..1.10 rows=1 width=12)
Output: a, b, c
Foreign File: test.csv
Foreign File Size: 12 b
Skipped Rows: 10

> >
> >
> > =# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv', on_error 'ignore',
> > log_verbosity 'verbose');
> > CREATE FOREIGN TABLE
> >
> > =# table f_fdw_test_3 ;
> > NOTICE: skipping row due to data type incompatibility at line 3 for
> > column i: "a"
> > NOTICE: 1 row was skipped due to data type incompatibility
> > i | t
> > ---+---
> > 1 | a
> > 2 | b
> > (2 rows)

IIUC we have to execute ALTER FOREIGN TABLE to change the
log_verbosity value and which requires to be the owner. Which seems
not to be user-friendly.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-22 22:27:13 Re: Remove dependence on integer wrapping
Previous Message Matthew Kim 2024-07-22 22:07:34 Re: Remove dependence on integer wrapping