From: | Nathan Boley <npboley(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Collect frequency statistics for arrays |
Date: | 2012-03-01 22:58:58 |
Message-ID: | CAHetpQSc-fyp6PyvAXGFgWLEDYswsG6ydAJ9bUdrrd-WBepgsw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
[ sorry Tom, reply all this time... ]
> What do you mean by "storing sequences as arrays"?
So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:
exon_type in [1,6]
splice_type = [1,3]
and then the array elements are
[ exon_type, splice_type, exon_type ]
~ 99% of the elements are of the form [ [1,3], 1, [1,3] ],
so I almost always get a hash or merge join ( correctly ) but for the
rare junction types ( which are usually more interesting as well ) I
correctly get nest loops with an index scan.
> Can you demonstrate
> that the existing stats are relevant at all to the query you're worried
> about?
Well, if we didn't have mcv's and just relied on ndistinct to estimate
the '=' selectivities, either my low selectivity quals would use the
index, or my high selectivity quals would use a table scan, either of
which would be wrong.
I guess I could wipe out the stats and get some real numbers tonight,
but I can't see how the planner would be able to distinguish *without*
mcv's...
Best,
Nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Marti Raudsepp | 2012-03-02 00:12:09 | Re: Caching for stable expressions with constant arguments v6 |
Previous Message | Tom Lane | 2012-03-01 22:44:58 | Re: Collect frequency statistics for arrays |