From: | Tiago Antão <tra(at)fct(dot)unl(dot)pt> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tiago Antão <tra(at)fct(dot)unl(dot)pt>, PostgreSQL Hackers list <pgsql-hackers(at)hub(dot)org> |
Subject: | Re: analyze.c |
Date: | 2000-08-23 17:22:40 |
Message-ID: | Pine.LNX.4.21.0008231742420.5111-100000@eros.si.fct.unl.pt |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 23 Aug 2000, Tom Lane wrote:
> > What's the big reason not to do that? I know that
> > there is some code in analyze.c (like comparing) that uses other parts of
> > pg, but that seems to be easily fixed.
>
> Are you proposing not to do any comparisons? It will be interesting to
> see how you can compute a histogram without any idea of equality or
> ordering. But if you want that, then you still need the function-call
> manager as well as the type-specific comparison routines for every
> datatype that you might be asked to operate on (don't forget
> user-defined types here).
I forgot user defined data types :-(, but regarding histograms I think
the code can be made external (at least for testing purposes):
1. I was not suggesting not to do any comparisons, but I think the only
comparison I need is equality, I don't need order as I don't need to
calculate mins or maxs (I just need mins and maxes on frequencies, NOT on
dat itself) to make a histogram.
2. The mapping to text guarantees that I have (PQgetvalue returns
always char* and pg_statistics keeps a "text" anyway) a way of knowing
about equality regardless of type.
But at least anything relating to order has to be in.
> > I'm leaning toward the implementation of end-biased histograms. There is
> > an introductory reference in the IEEE Data Engineering Bulletin, september
> > 1995 (available on microsoft research site).
>
> Sounds interesting. Can you give us an exact URL?
http://www.research.microsoft.com/research/db/debull/default.htm
BTW, you can get access to SIGMOD CDs with lots of goodies for a very low
price (at least in 1999 it was a bargain), check out ACM membership for
sigmod.
I've been reading something about implementation of histograms, and,
AFAIK, in practice histograms is just a cool name for no more than:
1. top ten with frequency for each
2. the same for top ten worse
3. average for the rest
I'm writing code get this info (outside pg for now - for testing
purposes).
Best Regards,
Tiago
PS - again: I'm starting, so, some of my comments can be completly dumb.
From | Date | Subject | |
---|---|---|---|
Next Message | Ross J. Reedstrom | 2000-08-23 17:55:18 | Re: [HACKERS] when does CREATE VIEW not create a view? |
Previous Message | The Hermit Hacker | 2000-08-23 17:21:24 | [7.0.2] problems with spinlock under FreeBSD? |