Re: Problems with Error Messages wrt Domains, Checks

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: john frazer <johnfrazer783(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problems with Error Messages wrt Domains, Checks
Date: 2018-03-19 14:44:56
Message-ID: CAKFQuwaaaf11qeRsD8cGKQPecwHdbFLMjdM6BUyONP3WLPny=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 17, 2018 at 12:54 PM, john frazer <johnfrazer783(at)gmail(dot)com>
wrote:

>
> ---------- Forwarded message ----------
> From: john frazer <johnfrazer783(at)gmail(dot)com>
> Date: Sat, Mar 17, 2018 at 6:28 PM
> Subject: Re: Problems with Error Messages wrt Domains, Checks
> To: "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
>
> > As such, it could be anywhere in my many, many kLOCs big DB
> definition. I cannot even search the RegEx with a RegEx because all I know
> is some parenthesis is missing, somewhere:
> >
> > ​Well, the error does point to the first statement in the chain of
> issues - working backward a couple of steps is possible.
>
> In this particular case that is possible, and I did manage to do it. The
> point is that in the
> general case the faulty regular expression could be anywhere, and there's
> no clue given at all.
>

​Frankly, I'm not seeing "invalid constant regular expressions" as being a
large scale problem - but I'll agree that having the error include the
actual literal being parsed as a RegEx should be done. If the targeted
reporting (e.g., stack trace) gets fixed as a side-effect of the more
annoying type input errors - which usually involves dynamic data - that
would be swell.

> > FAILURE: before the insert statement, everything runs dandy. We
> could have built an entire data warehouse application on top of a table
> definition that can never be syntactically processed but which will only
> fail when someone accidentally tries to insert a line.
> >
> > ​Since this is going to fail every single time you add a record I'm
> lacking sympathy here. "Accidentally tries to insert a line" - if the
> table wasn't meant to be used why does it exist in the first place?​ And
> if it is intended to be used then functional testing should quickly point
> out something like this.
>
> But there clearly can be tables that are used only now and then and might
> get checked
> for absence of rows. But regardless, I think the point stands that ideally
> you shouldn't
> be able to succesfully declare nonsensical objects and only be told so
> some kinds of
> usage patterns by runtime errors (with defective contexts), and in most
> cases, pgSQL
> does keep that promise.
>

I'm not disagreeing but I'm also not part of the solution. In terms of
importance I'd say its not that high given that I've never really felt the
lack personally. An invalid object, even though it doesn't fail at
creation, usually fails immediately after its first use which happens soon
enough after creation as to make pin-pointing its location generally
trivial.

>
> > ​I suppose the best practice when dealing with a lack of information in
> the error handle code flows is to limit the amount of context that is in
> scope by unit testing. And while they are absolutely short-comings
> overcoming them has cost in terms of both developer effort and, more
> importantly, runtime performance.
>
> I'm afraid no amount of unit testing of the DDL code can do this for me.
> Yes,
> in the first reported cases (the invalid RegExp), I can make sure I use
> each
> expression at least once so unsyntactic ones will make themselves shown.
> But
> in the other two cases, well, the production environment in which this
> came up
> has an insert statement that takes data from a largish source into the
> target table
> (20k rows of altogether >2m rows), and I *can't* unit test that data.
>

​I'd be inclined to not constrain the table itself at all and instead
perform soft validation post-load.​ You can process and remove offending
records and then add the constraints as a sanity check/documentation.

> FWIW the workaround that IÄve found is this:
>
> create table X.table_with_constraints (
> my_column text,
> constraint "my_column must start with 'x'" check ( Q.starts_with_x(
> my_column ) ),
> constraint "my_column must have 3 chrs" check ( Q.has_3_characters(
> my_column ) ) );
>
> In other words, I dispense with domains and use (small, boolean) functions
> (defined as `select` one-liners)
> because only then do I get told what piece of data comes doen the wrong
> way and where.
> It's a shame because this is essentially what I expect to do in a language
> like
> JavaScript.
>
>
​Yes, hopefully we can decide and implement value reporting for v12 (and
consider adding it in for v11) which would at least avoid the need for the
functions and thus rely on just the named constraints.

Though since its just you and I on this thread there is no one who can
write a patch speaking up...

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2018-03-19 14:47:11 Re: [HACKERS] GUC for cleanup indexes threshold.
Previous Message Tomas Vondra 2018-03-19 13:52:34 Re: [PROPOSAL] Shared Ispell dictionaries