Re: [BUGS] BUG #1552: massive performance hit

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Karim Nassar <karim(dot)nassar(at)NAU(dot)EDU>
Cc: Keith Browne <tuxedo(at)deepsky(dot)com>, Brian O'Reilly <fade(at)deepsky(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #1552: massive performance hit
Date: 2005-03-25 15:38:25
Message-ID: 1111765105.11750.749.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-performance

On Fri, 2005-03-25 at 03:50 -0700, Karim Nassar wrote:
> On Fri, 2005-03-25 at 10:18 +0000, Simon Riggs wrote:
> > > When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
> > > with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
> > > able to insert all this data in 5-7 minutes. It's taken a while to
> > > install Postgres 8.0.1 on the same machine, but now I have, and it's
> > > taking 40-45 minutes to run the same insert script.
>
> <snip>
>
> > OK. Not-a-bug.
> >
> > Your situation is covered in the manual with some sage advice
> > http://www.postgresql.org/docs/8.0/static/populate.html
> > It doesn't go into great lengths about all the reasons why those
> > recommendations are good ones - but they are clear.

> Simon, this begs the question: what changed from 7.4->8.0 to require he
> modify his script?

Good question. Clearly, some combination of stats-plus-index-selection
code changed but I suspect this is a case of more, not less accuracy,
affecting us here.

The FK code literally generates SQL statements, then prepares them.
AFAICS it should be possible to add more code to
src/backend/utils/adt/ritrigger.c to force the prepare of FK code to
avoid seq scans by executing "SET enable_seqscan = off;"
I'll have a play....

But, the wider point raised by this is whether Prepare should be more
conservative in the plan it generates. When we Execute a single query,
it is perfectly OK to go for the "best" plan, since it is being executed
only this once and we can tell, right now, which one the "best" is.

With a Prepared query, it is clearly going to be executed many times and
so we should consider that the optimal plan may change over time.

Index access has more overhead for small tables, but increases by (I
think) only logN as the number of rows in a table, N, increases.
Sequential scan access varies by N. Thus, as N increases from zero,
first of all Seq Scan is the best plan - but only marginally better than
Index access, then this changes at some value of N, then after that
index access is the best plan. As N increases, Seq Scan access clearly
diverges badly from Indexed access.

The conservative choice for unknown, or varying N would be index access,
rather than the best plan available when the query is prepared.

I propose a more general TODO item:

* Make Prepared queries always use indexed access, if it is available

Best Regards, Simon Riggs

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2005-03-25 16:09:30 Re: [BUGS] CC Date format code defaults to current centry
Previous Message Tom Lane 2005-03-25 14:41:09 Re: BUG #1552: massive performance hit between 7.4 and 8.0.1

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2005-03-25 16:01:18 Re: Delete query takes exorbitant amount of time
Previous Message Tom Lane 2005-03-25 15:17:17 Re: Delete query takes exorbitant amount of time