Re: Optimization with dates

From: Jason Earl <jason(dot)earl(at)simplot(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jason Earl <jason(dot)earl(at)simplot(dot)com>, Jean-Christophe Boggio <cat(at)thefreecat(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Optimization with dates
Date: 2001-11-14 17:17:44
Message-ID: 87zo5pi85j.fsf@npa01zz001.simplot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Jason Earl <jason(dot)earl(at)simplot(dot)com> writes:
> > I have a similar table (~16M rows) with an indexed timestamp, and have
> > had similar problems. I have found that even when I am using a
> > constant timestamp like in this query.
>
> > SELECT * FROM caseweights1 WHERE dt > '2001-11-01';
>
> > I start getting sequential scans with 7.1.3 long before they are
> > faster than index based queries.
>
> Just out of curiosity, do the indexed timestamps correlate closely
> to the physical order of the table? I'd expect that to happen if
> you are timestamping records by insertion time and there are few or
> no updates.

That's it precisely. Records in this table are inserted only, and
never updated. The records are timestamped when inserted.

> 7.1 and before assume that the index order is random with respect to
> the physical order, which is a good conservative assumption ... but
> it results in drastically overestimating the cost of an indexscan
> when strong correlation exists. 7.2 keeps some statistics about
> ordering correlation, and may perhaps do better with this sort of
> situation. (I have no doubt that its estimates will need further
> tweaking, but at least the effect is somewhat accounted for now.)
>
> regards, tom lane

I feel bad that I don't have 7.2b in testing yet. I have been waiting
for a Debian package :). I am quite excited about several new
features. I read HACKERS pretty religiously and so I was aware that
you had added some new statistics and that you have fiddled a bit with
the constant that decides when a sequential scan is a good idea. But
the fact of the matter is that I have already learned how to make
PostgreSQL do "the right thing" when it comes to using these indexes.
Besides, I generally only select from this table in 15 minute
intervals (~200 records) and PostgreSQL has no problems doing the
right thing by default.

The feature that I am most excited about is the new vacuum. Currently
I can only vacuum this table (and several similar tables) once every
two weeks while the factory is down for maintenance. This isn't a
very big deal as there are no updates or deletions from these tables
and the statistics in a ~16 million row table don't change much in two
weeks, but the new vacuum should do the correct thing for my tables
without any special handling on my part.

Thanks again,

Jason

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Sharmad Naik 2001-11-15 06:56:01 Re: error in postgresql
Previous Message Oleg Bartunov 2001-11-14 15:58:49 Re: Full text search