From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: distinct estimate of a hard-coded VALUES list |
Date: | 2016-08-22 17:42:14 |
Message-ID: | 20160822174214.GA133273@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas wrote:
> On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> >> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> It does know it, what it doesn't know is how many duplicates there are.
> >
> >> Does it know whether the count comes from a parsed query-string list/array,
> >> rather than being an estimate from something else? If it came from a join,
> >> I can see why it would be dangerous to assume they are mostly distinct.
> >> But if someone throws 6000 things into a query string and only 200 distinct
> >> values among them, they have no one to blame but themselves when it makes
> >> bad choices off of that.
> >
> > I am not exactly sold on this assumption that applications have
> > de-duplicated the contents of a VALUES or IN list. They haven't been
> > asked to do that in the past, so why do you think they are doing it?
>
> It's hard to know, but my intuition is that most people would
> deduplicate. I mean, nobody is going to want to their query generator
> to send X IN (1, 1, <repeat a zillion more times>) to the server if it
> could have just sent X IN (1).
Also, if we patch it this way and somebody has a slow query because of a
lot of duplicate values, it's easy to solve the problem by
de-duplicating. But with the current code, people that have the
opposite problem has no way to work around it.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2016-08-22 17:46:20 | Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure) |
Previous Message | Robert Haas | 2016-08-22 17:41:57 | Re: Proposal for CSN based snapshots |