Re: Inheritance & Indexes

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Alan Williams <alan_williams(at)affymetrix(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Inheritance & Indexes
Date: 2003-06-24 19:20:10
Message-ID: 20030624120902.M9143-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 24 Jun 2003, Alan Williams wrote:

>
> On Tue, 24 Jun 2003, Stephan Szabo wrote:
> > > hs.exon.2=> explain select * from ga_psr_transcript_1 t,
> > ga_psr_exon_1e where e.parent = t.id;
> > > QUERY PLAN
> > >
> > ------------------------------------------------------------------------
> > --------------------------------------------
> > > Merge Join (cost=0.00..9087.71 rows=176908 width=98)
> > > Merge Cond: ("outer".id = "inner".parent)
> > > -> Index Scan using ga_psr_transcript_1_pkey on
> > ga_psr_transcript_1 t (cost=0.00..1066.17 rows=43398 width=47)
> > > -> Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e
> > (cost=0.00..5259.52 rows=176908 width=51)
> > > (4 rows)
> > >
> > > If I do a join on the parent table, the optimizer refuses to use the
> > > indicies:
> > >
> > > hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
> > where e.parent = t.id;
> >
> > In this case, you can't use a single index scan to get the rows in order
> > so the part that makes the above a nice plan doesn't really apply. If
> > you're getting all the rows and sorting them, index scans are probably a
> > waste of time unless you have alot of dead space. If we supported
> > multi-table indexes, that'd potentially let you get a plan like
> > the above.
>
> Because of the foreign key constraint, the database engine could do
> the above query on each of the child tables and concatenate the
> results. This is because there is a notion in our schema of "paired"

I don't think it can do exon_1 -> transcript_1 union exon_2 ->
transcript_2 etc from the above unless there's also a guarantee of
uniqueness since if the same id showed up in transcript_1 and transcript_2
you'd have to join them both to a parent in exon_1. The individual id
primary keys are not sufficient to show that though so you'd have to join
exon_1 -> transcript_1 union exon_1 -> transcript_2 union exon_2 ->
transcript_1... to guarantee the same results I think.

I don't think that we're ever likely to figure out the optimization for
those cases in any case. Multi-table indexes will probably be coming
eventually which will allow a scan over that rather than the append step.

> > ------------------------------------------------------------------------
> > -----------------------
> > > Merge Join (cost=1239155.37..70188119.40 rows=5514877218 width=334)
> > > Merge Cond: ("outer".id = "inner".parent)
> > > -> Sort (cost=243481.37..244816.14 rows=533908 width=165)
> > > Sort Key: t.id
> > > -> Append (cost=0.00..10980.08 rows=533908 width=165)
> > [lots of seqscans snipped]
> > > -> Sort (cost=995674.00..1000838.64 rows=2065853 width=169)
> > > Sort Key: e.parent
> > > -> Append (cost=0.00..43563.52 rows=2065853 width=169)
> > [more seqscans snipped]
> >
> > > Same thing even if I'm querying for a specific tuple:
> > >
> > > hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
> > > where e.parent = t.id and t.id = 123;
> >
> > ISTM it's willing to use an index scan on at least some of t's
> > subtables.
> > Does explicitly saying e.parent=123 help?
>
> Yes, adding e.parent=123 results in the desired result of index scans
> into both tables. However, without including this the optimizer still
> predicts 31 results from the index scans on ga_psr_transcript* and yet
> insists on using a seq scan into each ga_psr_exon* table. It expects
> to get 2065853 rows back from the ga_psr_exon* tables when in reality
> it is more like 310 rows.

Yeah, it's guessing the number of rows rather poorly. Without the
implied search condition, the index scan wouldn't help barring a small
estimated number of rows in t making nested loop look good (I assume the
t estimate is way off too, does analyzing the various tables or possibly
raising the analyze buckets for the id column and analyzing get that
estimate to something reasonable?). I *think* 7.4 may be smarter about
implying these conditions as well.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Mascari 2003-06-24 19:32:22 Re: Failure to install 7.3.3
Previous Message scott.marlowe 2003-06-24 19:08:08 Re: A creepy story about dates. How to prevent it?