Re: proposal: possibility to read dumped table's name from file

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2022-11-11 08:11:25
Message-ID: 20221111081125.bxtndzqlt5tpc5qw@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
>
> pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com>
> napsal:
>
> > > But one thing I noticed is that "optarg" looks wrong here:
> > >
> > > simple_string_list_append(&opts->triggerNames, optarg);
> >
> > Ah indeed, good catch! Maybe there should be an explicit test for every
> > (include|exclude) / objtype combination? It would be a bit verbose (and
> > possibly hard to maintain).
> >
>
> yes - pg_restore is not well covered by tests, fixed
>
> I found another issue. The pg_restore requires a full signature of the
> function and it is pretty sensitive on white spaces (pg_restore).

Argh, indeed. It's a good thing to have expanded the regression tests :)

> I made a
> mistake when I partially parsed patterns like SQL identifiers. It can work
> for simple cases, but when I parse the function's signature it stops
> working. So I rewrote the parsing pattern part. Now, I just read an input
> string and I try to reduce spaces. Still multiline identifiers are
> supported. Against the previous method of pattern parsing, I needed to
> change just one regress test - now I am not able to detect garbage after
> pattern :-/.

I'm not sure it's really problematic. It looks POLA-violation compatible with
regular pg_dump options, for instance:

$ echo "include table t1()" | pg_dump --filter - | ag CREATE
CREATE TABLE public.t1 (

$ pg_dump -t "t1()" | ag CREATE
CREATE TABLE public.t1 (

$ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
pg_dump: error: no matching tables were found

$ pg_dump -t "t1()blabla" | ag CREATE
pg_dump: error: no matching tables were found

I don't think the file parsing code should try to be smart about checking the
found patterns.

> * function's filtering doesn't support schema - when the name of function
> is specified with schema, then the function is not found

Ah I didn't know that. Indeed it only expect a non-qualified identifier, and
would restore any function that matches the name (and arguments), possibly
multiple ones if there are variants in different schema. That's unrelated to
this patch though.

> * the function has to be specified with an argument type list - the
> separator has to be exactly ", " string. Without space or with one space
> more, the filtering doesn't work (new implementation of pattern parsing
> reduces white spaces sensitivity). This is not a bug, but it is not well
> documented.

Agreed.

> attached updated patch

It looks overall good to me! I just have a few minor nitpicking complaints:

- you removed the pg_strip_clrf() calls and declared everything as "const char
*", so there's no need to explicitly cast the filter_get_keyword() arguments
anymore

Note also that the code now relies on the fact that there are some non-zero
bytes after a pattern to know that no errors happened. It's not a problem as
you should find an EOF marker anyway if CLRF were stripped.

+ * Following routines are called from pg_dump, pg_dumpall and pg_restore.
+ * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this routine in pg_dumpall. So instead
+ * of directly calling exit_nicely we have to return some error flag (in this
+ * case NULL), and exit_nicelly will be executed from caller's routine.

Slight improvement:
[...]
Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore is
different from the one in pg_dumpall, so instead of...

+ * read_pattern - reads an pattern from input. The pattern can be mix of
+ * single line or multi line subpatterns. Single line subpattern starts first
+ * non white space char, and ending last non space char on line or by char
+ * '#'. The white spaces inside are removed (around char ".()"), or reformated
+ * around char ',' or reduced (the multiple spaces are replaced by one).
+ * Multiline subpattern starts by double quote and ending by this char too.
+ * The escape rules are same like for SQL quoted literal.
+ *
+ * Routine signalizes error by returning NULL. Otherwise returns pointer
+ * to next char after last processed char in input string.

typo: reads "a" pattern from input...

I don't fully understand the part about subpatterns, but is that necessary to
describe it? Simply saying that any valid and possibly-quoted identifier can
be parsed should make it clear that identifiers containing \n characters should
work too. Maybe also just mention that whitespaces are removed and special
care is taken to output routines in exactly the same way calling code will
expect it (that is comma-and-single-space type delimiter).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-11-11 08:42:25 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Amit Kapila 2022-11-11 07:23:15 Re: Typo about subxip in comments