Re: Strange (?) Index behavior?

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Allen Landsidel <alandsidel(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange (?) Index behavior?
Date: 2004-11-11 21:48:27
Message-ID: 4193DE2B.5060407@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>>-----------------------------------------------------------------------------------------------------------------------------------------------
>>> Index Scan using sname_unique on "testtable" (cost=0.00..34453.74
>>>rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
>>>loops=1)
>>> Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text))
>>> Filter: (sname ~~ 'AA%'::text)
>>> Total runtime: 537477.737 ms
>>>(4 rows)
>>>
>>>Time: 537480.571 ms
>>
>>Nothing you're going to do to the query is going to come up with a more
>>effective plan than this. It's using the index after all. It's never going to
>>be lightning fast because it has to process 75k rows.
>>
>>However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
>>about 10 seconds.

I am confused about this statement. I have a table with 1.77 million
rows that I use gist indexes on (TSearch) and I can pull out of it in
less than 2 seconds.

Are you saying it should be taking 10 seconds because of the type of
plan? 10 seconds seems like an awfullong time for this.

Sincerely,

Joshua D. Drake

>
>
> That's my feeling as well, I thought the index was to blame because it
> will be quite large, possibly large enough to not fit in memory nor be
> quickly bursted up.
>
>
>>The 77ms before finding the first record is a bit suspicious. Have you
>>vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
>>results. You might try to REINDEX it as well, though I doubt that would help.
>
>
> This table is *brand spanking new* for lack of a better term. I have
> the data for it in a CSV. I load the CSV up which takes a bit, then
> create the indexes, do a vacuum analyze verbose, and then posted the
> results above. I don't think running vacuum a more times is going to
> change things, at least not without tweaking config settings that
> affect vacuum. Not a single row has been inserted or altered since the
> initial load.. it's just a test.
>
> I can't give vacuum stats right now because the thing is reloading
> (again) with different newfs settings -- something I figure I have the
> time to fiddle with now, and seldom do at other times. These numbers
> though don't change much between 8K on up to 64K 'cluster' sizes. I'm
> trying it now with 8K page sizes, with 8K "minimum fragment" sizes.
> Should speed things up a tiny bit but not enough to really affect this
> query.
>
> Do you still see a need to have the output from the vacuum?
>
>
>>Actually you might consider clustering the table on sname_unique. That would
>>accomplish the same thing as the VACUUM FULL command and also speed up the
>>index scan. And the optimizer knows (if you analyze afterwards) it so it
>>should be more likely to pick the index scan. But currently you have to rerun
>>cluster periodically.
>
>
> Clustering is really unworkable in this situation. It would work now,
> in this limited test case, but using it if this were to go into
> production is unrealistic. It would have to happen fairly often since
> this table is updated frequently, which will break the clustering
> quickly with MVCC.
>
> Running it often.. well.. it has 70M+ rows, and the entire table is
> copied, reordered, and rewritten.. so that's a lot of 'scratch space'
> needed. Finally, clustering locks the table..
>
> Something I'd already considered but quickly ruled out because of
> these reasons..
>
> More ideas are welcome though. ;)
>
> -Allen
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment Content-Type Size
jd.vcf text/x-vcard 640 bytes

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-11-12 18:42:02 Clarification on two bits on VACUUM FULL VERBOSE output
Previous Message Tom Lane 2004-11-11 21:41:51 Re: Strange (?) Index behavior?