From: | Josh Kupershmidt <schmiddy(at)gmail(dot)com> |
---|---|
To: | Alban Hertroys <haramrae(at)gmail(dot)com> |
Cc: | Abelard Hoffman <abelardhoffman(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Will pg_repack improve this query performance? |
Date: | 2014-10-16 20:32:44 |
Message-ID: | CAK3UJREvAASY4s4PDpkzAdoZkcnX=upDU0ifVzqrz4fA-FoyNg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Oct 15, 2014 at 5:03 AM, Alban Hertroys <haramrae(at)gmail(dot)com> wrote:
> A CLUSTER would help putting rows with the same to_id together. Disk access would be less random that way, so it would help some.
>
> According to your query plan, accessing disks (assuming that’s what made the difference) was 154 (7700 ms / 50 ms) times slower than accessing memory. I don’t have the numbers for your disks or memory, but that doesn’t look like an incredibly unrealistic difference. That begs the question, how random was that disk access and how much can be gained from clustering that data?
Other than grouping tuples in a more favorable order to minimize I/O,
the big benefit of running a CLUSTER or pg_repack is that you
eliminate any accumulated bloat. (And if bloat is your real problem,
ideally you can adjust your autovacuum settings to avoid the problem
in the future.) You may want to check on the bloat of that table and
its indexes with something like this:
From | Date | Subject | |
---|---|---|---|
Next Message | Abelard Hoffman | 2014-10-16 21:41:11 | Re: Will pg_repack improve this query performance? |
Previous Message | Adrian Klaver | 2014-10-16 20:22:54 | Re: COPY data into a table with a SERIAL column? |