From: | Erik Jones <erik(at)myemma(dot)com> |
---|---|
To: | John Beaver <john(dot)e(dot)beaver(at)gmail(dot)com> |
Cc: | Matthew <matthew(at)flymine(dot)org>, Pgsql-Performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: large tables and simple "= constant" queries using indexes |
Date: | 2008-04-10 15:02:48 |
Message-ID: | FC30283E-575E-4119-A7CA-0645CC71D41F@myemma.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Apr 10, 2008, at 9:44 AM, John Beaver wrote:
> Thanks a lot, all of you - this is excellent advice. With the data
> clustered and statistics at a more reasonable value of 100, it now
> reproducibly takes even less time - 20-57 ms per query.
>
> After reading the section on "Statistics Used By the Planner" in the
> manual, I was a little concerned that, while the statistics sped up
> the queries that I tried immeasurably, that the most_common_vals
> array was where the speedup was happening, and that the values which
> wouldn't fit in this array wouldn't be sped up. Though I couldn't
> offhand find an example where this occurred, the clustering approach
> seems intuitively like a much more complete and scalable solution,
> at least for a read-only table like this.
>
> As to whether the entire index/table was getting into ram between my
> statistics calls, I don't think this was the case. Here's the
> behavior that I found:
> - With statistics at 10, the query took 25 (or so) seconds no matter
> how many times I tried different values. The query plan was the same
> as for the 200 and 800 statistics below.
> - Trying the same constant a second time gave an instantaneous
> result, I'm guessing because of query/result caching.
> - Immediately on increasing the statistics to 200, the query took a
> reproducibly less amount of time. I tried about 10 different values
> - Immediately on increasing the statistics to 800, the query
> reproducibly took less than a second every time. I tried about 30
> different values.
> - Decreasing the statistics to 100 and running the cluster command
> brought it to 57 ms per query.
> - The Activity Monitor (OSX) lists the relevant postgres process as
> taking a little less than 500 megs.
> - I didn't try decreasing the statistics back to 10 before I ran the
> cluster command, so I can't show the search times going up because
> of that. But I tried killing the 500 meg process. The new process
> uses less than 5 megs of ram, and still reproducibly returns a
> result in less than 60 ms. Again, this is with a statistics value of
> 100 and the data clustered by gene_prediction_view_gene_ref_key.
>
> And I'll consider the idea of using triggers with an ancillary table
> for other purposes; seems like it could be a useful solution for
> something.
FWIW, killing the backend process responsible for the query won't
necessarily clear the table's data from memory as that will be in the
shared_buffers. If you really want to flush the data from memory you
need to read in data from other tables of a size total size greater
than your shared_buffers setting.
Erik Jones
DBA | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
From | Date | Subject | |
---|---|---|---|
Next Message | Gaetano Mendola | 2008-04-10 16:18:15 | Re: large tables and simple "= constant" queries using indexes |
Previous Message | valgog | 2008-04-10 14:58:48 | Re: Performance with temporary table |