From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jotpe <jotpe(at)posteo(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: percentile value check can be slow |
Date: | 2017-11-19 17:49:38 |
Message-ID: | 20171119174938.GL4411@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 19, 2017 at 01:23:42PM +0100, Tomas Vondra wrote:
> Hi,
>
> On 11/19/2017 03:10 AM, David Fetter wrote:
> > On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:
> >> Hi,
> >>
> >> ...
> >>
> >> Is 'recognizer' an established definition I should know? Is it the same
> >> as 'validator' or is it something new/different?
> >
> > I borrowed it from http://langsec.org/
> >
> > I'm not entirely sure what you mean by a validator, but a recognizer
> > is something that gives a quick and sure read as to whether the input
> > is well-formed. In general, it's along the lines of a tokenizer, a
> > parser, and something that does very light post-parse analysis for
> > correctness of form.
> >
> > For the case that started the thread, a recognizer would check
> > something along the lines of
> >
> > CHECK('[0,1]' @> ALL(input_array))
>
> OK, thanks. From what I understand, recognizer is more about recognizing
> if a string is valid within a given formal language (essentially, if
> it's a well-formed program). That may not be the right term for checks
> on parameter values.
There are two hard problems in computer science: naming things, cache
coherency, and off-by-one.
> OTOH we already have "validators" on a number of places - functions
> checking various parameters, e.g. reloptions for FDWs, etc.
>
> But I guess the naming can be solved later ...
Indeed.
> >>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
> >>> Allow optional CHECK constraints in CREATE AGGREGATE for direct
> >>> arguments.
> >>>
> >>
> >> How will any of the approaches deal with something like
> >>
> >> select percentile_cont((select array_agg(v) from p))
> >> within group (order by a) from t;
> >>
> >> In this case the the values are unknown after the parse analysis, so I
> >> guess it does not really address that.
> >
> > It doesn't. Does it make sense to do a one-shot execution for cases
> > like that? It might well be worth it to do the aggregate once in
> > advance as a throw-away if the query execution time is already going
> > to take awhile. Of course, you can break that one by making p a JOIN
> > to yet another thing...
> >
> >> FWIW while I won't stand in the way of improving this, I wonder if this
> >> is really worth the additional complexity. If you get errors like this
> >> with a static list of values, you will fix the list and you're done. If
> >> the list is dynamic (computed in the query itself), you'll still get the
> >> error much later during query execution.
> >>
> >> So if you're getting many failures like this for the "delayed error
> >> reporting" to be an issue, perhaps there's something wrong in you stack
> >> and you should address that instead?
> >
> > I'd like to think that getting something to fail quickly and cheaply
> > when it can will give our end users a better experience. Here,
> > "cheaply" refers to their computing resources and time.
>
> The trouble is, this increases execution time for everyone, including
> people who carefully construct the parameter values. That seems rather
> undesirable.
I may be wrong but I'm pretty sure that a check for well-formed direct
parameters will not impose a significant cost on aggregates.
It occurs to me that this particular aggregate could take an array of
a domain defined along the lines of:
CREATE DOMAIN float4_0_1_closed AS float4
NOT NULL
CHECK(VALUE >= 0.0 AND VALUE <= 1.0);
Then the check would happen much earlier without adding a bunch of
potentially expensive machinery.
> > Clearly, not having this happen in this case bothered Johannes
> > enough to wade in here.
>
> No. He was surprised the error is reported after significant amount
> of time, but he does not seem to claim failing faster would be
> valuable to him. That is your assumption, and I have my doubts about
> it.
My mistake. I shouldn't have guessed when there was a better
alternative.
Johannes, could you help us understand your thinking in reporting
this?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-11-19 18:25:07 | Re: no library dependency in Makefile? |
Previous Message | Tom Lane | 2017-11-19 17:26:39 | Re: [HACKERS] Bug in to_timestamp(). |