trying to use CLUSTER

From: "Sahagian, David" <david(dot)sahagian(at)emc(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: trying to use CLUSTER
Date: 2013-02-12 22:38:58
Message-ID: F3CBFBA88397EA498B22A05FFA9EC49D0109F191F8@MX22A.corp.emc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Version=9.1.7

INFO: clustering "my_cool_table" using sequential scan and sort
INFO: "my_cool_table": found 1 removable, 1699139 nonremovable row versions in 49762 pages
Detail: 1689396 dead row versions cannot be removed yet.
CPU 9.80s/4.98u sec elapsed 175.92 sec.

INFO: clustering "my_cool_table" using sequential scan and sort
INFO: "my_cool_table": found 7552 removable, 21732 nonremovable row versions in 50007 pages
Detail: 11482 dead row versions cannot be removed yet.
CPU 0.01s/0.23u sec elapsed 36.29 sec.

INFO: clustering "my_cool_table" using index scan on "pk_cool"
INFO: "my_cool_table": found 621462 removable, 36110 nonremovable row versions in 26135 pages
Detail: 25128 dead row versions cannot be removed yet.
CPU 0.02s/0.35u sec elapsed 0.79 sec.

So my_cool_table gets inserted into (but not updated) by regular processes doing their smallish CRUD transactions.

Concurrently, ONE process repeatedly "sweeps" a chunk of rows from the table every few seconds.
(ie, it does delete...returning, and then commits the sweep)
Note that if the table has not many rows, then all the rows will be swept together.

It is possible for something to go wrong resulting in:
the table still being filled, but no longer being swept.

When the sweeping finally gets re-started, it must now chomp down a very large table.
When it finally sweeps down to near zero rows remaining, my idea was to do a CLUSTER on the table.
My expectation is that a VERY SMALL percentage of the row versions would actually get written to the new table!

My hope is that a smaller heap is better, now that the rate of sweeping is back to the rate of filling,
with the assumption that it will stay this way 99% of the time.

Can somebody tell me why some "dead row versions cannot be removed yet" ?
I assume that means CLUSTER must write them to the new table ?

It seems very costly to do the CLUSTER, if the new table is not really going to be a tiny fraction of the old table.
Is there a way for me to discover the approx number of "non-removables" BEFORE I do the CLUSTER ?
? Some pg_table query ? maybe after an analyze ?

Also, does the use of [index scan on "pk_cool"] basically depend on the ratio of removable/nonremovable row versions ?

Thanks,
-dvs-

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2013-02-13 00:10:00 Re: Re: permission denied to create extension "ltree" Must be superuser to create this extension.
Previous Message John R Pierce 2013-02-12 22:23:23 Re: PG V9 on NFS