Re: Query planner question

From: Soni M <diptatapa(at)gmail(dot)com>
To: Alban Hertroys <haramrae(at)gmail(dot)com>
Cc: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query planner question
Date: 2014-08-25 11:17:28
Message-ID: CAAMgDXktLYem+sye3ny6fO=SDOn-7T2McecmSwUrSb0NHjb0vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

here's the explain analyze result : http://explain.depesz.com/s/Mvv and
http://explain.depesz.com/s/xxF9

it seems that i need to dig more on query planner parameter.

BTW, thanks all for the helps.

On Sat, Aug 23, 2014 at 4:33 PM, Alban Hertroys <haramrae(at)gmail(dot)com> wrote:

> On 23 Aug 2014, at 4:34, Soni M <diptatapa(at)gmail(dot)com> wrote:
> > On Fri, Aug 22, 2014 at 9:10 PM, Alban Hertroys <haramrae(at)gmail(dot)com>
> wrote:
> > On 22 August 2014 14:26, Soni M <diptatapa(at)gmail(dot)com> wrote:
> > > Currently we have only latest_transmission_id as FK, described here :
> > > TABLE "ticket" CONSTRAINT "fkcbe86b0c6ddac9e" FOREIGN KEY
> > > (latest_transmission_id) REFERENCES transmission_base(transmission_id)
> > >
> > > Change the query to include only FK still result the same:
> > > explain select t.ticket_id ,
> > > tb.transmission_id
> > > from ticket t,
> > > transmission_base tb
> > > where t.latest_transmission_id = tb.transmission_id
> > > and tb.parse_date > ('2014-07-31');
> > > QUERY PLAN
> > >
> ----------------------------------------------------------------------------------------------------------------------------------
> > > Hash Join (cost=113928.06..2583606.96 rows=200338 width=8)
> > > Hash Cond: (t.latest_transmission_id = tb.transmission_id)
> > > -> Seq Scan on ticket t (cost=0.00..1767767.26 rows=69990826
> width=8)
> > > -> Hash (cost=108923.38..108923.38 rows=400374 width=4)
> > > -> Index Scan using transmission_base_by_parse_date on
> > > transmission_base tb (cost=0.00..108923.38 rows=400374 width=4)
> > > Index Cond: (parse_date > '2014-07-31
> 00:00:00'::timestamp
> > > without time zone)
> > > (6 rows)
> >
> > Do you have an index on ticket (latest_transmission_id)?
> >
> > Yes, both t.latest_transmission_id and tb.transmission_id is indexed.
> >
> > Indexes:
> > "transmission_base_pkey" PRIMARY KEY, btree (transmission_id) CLUSTER
> > Indexes:
> > "ticket_by_latest_transmission" btree (latest_transmission_id)
>
> Okay, so we got those indexes. So much for the low-hanging fruit.
>
> From the above plan we learn that the database estimates[1] that 400k rows
> from transmission match your condition (parse_date > '2014-07-31’). The
> ticket table has a foreign key to that table, which suggests a 1:n
> relationship. It also has around 70M rows, or at least the database seems
> to think that about that amount will match those 400k transmissions.
>
> That means that if on average 175 (=70M/400k) ticket ID’s match a
> transmission ID, the database would be needing all those 70M rows anyway -
> and even if it only needs every 175th row, a sequential scan is not a
> particularly inefficient way to go about this.
> The alternative is a whole lot of index lookups, probably not in the same
> order as either the index or the rows on disk, meaning quite a bit of
> random disk I/O.
>
> I’m suspecting that the cost estimates for this query with seq-scans
> disabled aren’t very different, provided doing so comes up with a
> comparable plan.
>
> Things you might want to verify/try:
> * Are those estimated numbers of rows accurate? If not, is autovacuum (or
> scheduled vacuum) keeping up with the amount of data churn on these tables?
> Do you collect a sufficiently large sample for the statistics?
> * How much bloat is in these tables/indexes?
> * Did you change planner settings (such as disabling bitmap scans; I kind
> of expected one here) or did you change cost estimate parameters?
> * Does it help to put an index on transmission (parse_date,
> transmission_id)?
> * If none of that helps, we’re going to need the output of explain analyze
> - that will probably take long to create, so you might as well start with
> that and do the other stuff at the side.
>
> What kind of hardware are these disks on? Is it possible that disk I/O on
> this particular machine is relatively slow (relative to the seq/random cost
> factor for disk access as specified in your postgresql.conf)?
>
> Cheers,
>
> Alban Hertroys
>
> [1] You did not provide explain analyse output, so we only have estimates
> to work with.
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>

--
Regards,

Soni Maula Harriz

In response to

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2014-08-25 11:17:40 Re: deadlock in single-row select-for-update + update scenario? How could it happen?
Previous Message Marc Mamin 2014-08-25 10:48:55 Way to identify the current session's temp tables within pg_class ?