From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: performance of IN (subquery) |
Date: | 2004-08-27 19:22:02 |
Message-ID: | 87r7psmlnp.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > I'm not about to run analyze in the middle of the data generation
> > (which wouldn't work anyways since it's in a transaction).
>
> Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
> The cached-plan business is a problem, I agree, but I think it's
> orthogonal to this particular discussion (and you can always use EXECUTE
> if you have to).
It's orthogonal. My point was that I have a bigger problem, but even if I
address it by switching away from plpgsql, or I guess by using EXECUTE, I
would still have a problem. I didn't realize you could run analyze in a
transaction, but even being able to I wouldn't really want to have to do that
repeatedly during the job.
This new approach would actually complete the fix, a perl or plpgsql EXECUTE
implementation would gradually shift statistics during the job.
Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.
> but separate ANALYZE could definitely make an estimate of the fraction of
> dead tuples.
With analyze in a transaction I'm not clear what the semantics should be
though. I suppose it should only count tuples visible to the transaction
analyze?
> Nope, you aren't. The above seems to me to be a recipe for degradation
> of performance over time, precisely because the plans wouldn't change in
> the face of changes in the situation.
A gradual degradation is ok. A gradual degradation means I can schedule a
nightly analyze and report on any changed plans and either automatically
accept them or manually approve them individually.
A sudden degradation is much more dangerous. Even if it's rare, a sudden
degradation means an outage in prime time.
As I said, it doesn't matter to me if every query is 10% slower than possible,
as long as no query takes 1000% as long as necessary even if it's a 1 in 1000
occurrence.
> I've resisted adding "planner hints" to the language for this reason, and
> I'm certainly not eager to offer any hard guarantees about plans not
> changing.
I just want to control _when_ they change. Eventually you'll come around. I
think it'll be a slow gradual change in thinking as the user-base changes
though. Not something I'll change with a single argument in one day.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2004-08-27 19:23:36 | Re: R: R: space taken by a row & compressed data |
Previous Message | Dino Vliet | 2004-08-27 19:20:45 | job for sql, pl/pgsql,gawk,perl or ?? |