Re: [GENERAL] Problems with inequalities on numeric fields in 6.5

From: Martin Weinberg <weinberg(at)osprey(dot)phast(dot)umass(dot)edu>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: Martin Weinberg <weinberg(at)osprey(dot)phast(dot)umass(dot)edu>, pgsql-general(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Problems with inequalities on numeric fields in 6.5
Date: 1999-07-08 14:44:32
Message-ID: 199907081544.LAA01597@osprey.phast.umass.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bruce Momjian wrote on Wed, 07 Jul 1999 12:29:13 EDT
>> Thanks, Bruce!
>>
>> Yes, I tried the latter query and it's the same:
>>
>> --------------------------------------------------
>>
>> lmc=> explain select count(*) from lmctot where j_m>3.4::float4 and j_m<3.5:
>:float4;
>> NOTICE: QUERY PLAN:
>>
>> Aggregate (cost=62349.97 rows=788100 width=4)
>> -> Index Scan using j on lmctot (cost=62349.97 rows=788100 width=4)
>>
>> EXPLAIN
>>
>> --------------------------------------------------
>> I've tried all permutations of the conversions in the ranges with
>> similar results (and vacuum analyzed several times as well as
>> dumped and reloaded and reloaded from scracth). We have
>> a larger database with 20M rows which has a similar behavior.
>>
>> There are 7092894 rows in database "lmc". So:
>>
>> (3.5-3.4)/(99.999-2.731) = 7292.1 != 788k
>>
>> A clue?
>
>I have just fixed a problem with index size estimates. Try adding
>#include <math.h> to the top of backend/optimizer/util/plancat.c. That
>may fix the estimated number of tuples returned. However, it don't
>think you are going to get better performance, since you are already
>using the index in the above case. The only big win I can think of is
>to use CLUSTER on that field. That should speed things up quite a bit.
>

Hi Bruce,

Ok. Sorry about the delay.

I added the math.h but that doesn't seem to change the
query plan output.

I then dropped all the indices, made a new one on three of the
variables and clustered:

create index m_col on lmctot using btree (j_m, h_m, k_m);
cluster m_col on lmctot;
vacuum analyze;

where the j_m, h_m, k_m are three float4 fields.

The cluster took about 18 hours on my 7.1 million records
(this is a dual 450Mhz Xeon Linux box). Not sure why
this was so slow.

Anyway, this *hugely* improved queries of form:

select count(*) from lmctot where j_m>3.4::float4 and j_m<3.5:

although the explain query plan output is identical. However
using h_m or k_m (not the first variable in the index) appears
to be doing a sequential scan. Is that right?

I then made indices on h_m and k_m, vacuum analyzed and tried
again, but got identical performance. If this is the way
it is, so be it, but I have the feeling that something is
not working properly.

Any ideas?

Again, with _heaps_ of thanks,

--Martin

===========================================================================

Martin Weinberg Phone: (413) 545-3821
Dept. of Physics and Astronomy FAX: (413) 545-2117/0648
530 Graduate Research Tower
University of Massachusetts
Amherst, MA 01003-4525

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jonathan davis 1999-07-08 15:16:10 just little BUG
Previous Message Bob Kruger 1999-07-08 14:21:12 User priveledges