From: | Alexander Cheshev <alex(dot)cheshev(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Multidimensional Histograms |
Date: | 2024-01-07 23:49:11 |
Message-ID: | CAN_hQmv5ZK_TDsOkRU6Zy5Wn0GjqUkDNAQaNEor5Fs+T1raeVw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tomas,
> See section 3.2 in this paper (the "view PDF" does not work for me, but
> the "source PDF" does download a postscript):
I believe that you are referring to a dynamic programming approach. It
is a 1-dimensional case! To find an optimal solution in the
M-dimensional case is an NP-hard problem.
Regards,
Alexander Cheshev
On Mon, 8 Jan 2024 at 00:29, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
> On 1/7/24 23:53, Alexander Cheshev wrote:
> > Hi Tomas,
> >
> >> The thing I was proposing is that it should be possible to build
> >> histograms with bins adapted to density in the given region. With
> >> smaller buckets in areas with high density. So that queries intersect
> >> with fewer buckets in low-density parts of the histogram.
> >
> > This is how Equi-Depth Histogram works. Postgres has maller buckets in
> > areas with high density:
> >
> > values[(i * (nvals - 1)) / (num_hist - 1)]
> >
> True, but the boundaries are somewhat random. Also, I was referring to
> the multi-dimensional case, it wasn't clear to me if the proposal is to
> do the same thing.
>
> >> I don't recall the details of the MHIST-2 scheme, but it's true
> >> calculating "perfect" V-optimal histogram would be quadratic O(N^2*B).
> >
> > In M-dimensional space "perfect" V-Optimal Histogram is an NP-hard
> > problem. In other words it is not possible to build it in polynomial
> > time. How did you come up with the estimate?!
> >
> See section 3.2 in this paper (the "view PDF" does not work for me, but
> the "source PDF" does download a postscript):
>
> https://citeseerx.ist.psu.edu/doc_view/pid/35e29cbc2bfe6662653bdae1fb89c091e2ece560
>
> >> But that's exactly why greedy/approximate algorithms exist. Yes, it's
> >> not going to produce the optimal V-optimal histogram, but so what?
> >
> > Greedy/approximate algorithm has time complexity O(M*N*B), where M
> > equals the number of dimensions. MHIST-2 is a greedy/approximate
> > algorithm.
> >
> >> And how does this compare to the approximate/greedy algorithms, both in
> >> terms of construction time and accuracy?
> >
> > Time complexity of Equi-Depth Histogram has no dependence on B.
> >
> Really? I'd expect that to build B buckets, the algorithm repeat some
> O(M*N) action B-times, roughly. I mean, it needs to pick a dimension by
> which to split, then do some calculation on the N tuples, etc.
>
> Maybe I'm imagining that wrong, though. It's been ages since I looked ad
> big-O complexity and/or the histograms. I'd have to play with those
> algorithms for a bit again.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-01-08 00:00:00 | Re: Extract numeric filed in JSONB more effectively |
Previous Message | Tomas Vondra | 2024-01-07 23:29:01 | Re: Multidimensional Histograms |