Re: In progress INSERT wrecks plans on table

From: mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>
Cc: "Mark Kirkwood" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gavin Flower" <gavinflower(at)archidevsys(dot)co(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: In progress INSERT wrecks plans on table
Date: 2013-05-07 00:23:58
Message-ID: feb60a4b1d9e3af7b646b0274064a1d9.squirrel@mail.catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

> Simon Riggs wrote:
>
> Patch works and improves things, but we're still swamped by the block
> accesses via the index.

Which *might* be enough to stop it making the server go unresponsive,
we'll look at the effect of this in the next few days, nice work!

>
> Which brings me back to Mark's original point, which is that we are
> x100 times slower in this case and it *is* because the choice of
> IndexScan is a bad one for this situation.
>
> After some thought on this, I do think we need to do something about
> it directly, rather than by tuning infrastructire (as I just
> attempted). The root cause here is that IndexScan plans are sensitive
> to mistakes in data distribution, much more so than other plan types.
>
> The two options, broadly, are to either
>
> 1. avoid IndexScans in the planner unless they have a *significantly*
> better cost. At the moment we use IndexScans if cost is lowest, even
> if that is only by a whisker.
>
> 2. make IndexScans adaptive so that they switch to other plan types
> mid-way through execution.
>
> (2) seems fairly hard generically, since we'd have to keep track of
> the tids returned from the IndexScan to allow us to switch to a
> different plan and avoid re-issuing rows that we've already returned.
> But maybe if we adapted the IndexScan plan type so that it adopted a
> more page oriented approach internally, it could act like a
> bitmapscan. Anyway, that would need some proof that it would work and
> sounds like a fair task.
>
> (1) sounds more easily possible and plausible. At the moment we have
> enable_indexscan = off. If we had something like
> plan_cost_weight_indexscan = N, we could selectively increase the cost
> of index scans so that they would be less likely to be selected. i.e.
> plan_cost_weight_indexscan = 2 would mean an indexscan would need to
> be half the cost of any other plan before it was selected. (parameter
> name selected so it could apply to all parameter types). The reason to
> apply this weighting would be to calculate "risk adjusted cost" not
> just estimated cost.
>

I'm thinking that a variant of (2) might be simpler to inplement:

(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.

regards

Mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-05-07 00:53:16 Re: pg_dump --snapshot
Previous Message Stephen Frost 2013-05-07 00:18:26 Re: pg_dump --snapshot

Browse pgsql-performance by date

  From Date Subject
Next Message Igor Neyman 2013-05-07 02:03:56 Re: Deterioration in performance when query executed in multi threads
Previous Message Gavin Flower 2013-05-06 22:11:30 Re: Deterioration in performance when query executed in multi threads