Quick Links

A Tale of 2 algorithms

From:	Colin Taylor <colin(dot)taylor(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	A Tale of 2 algorithms
Date:	2012-10-01 21:24:47
Message-ID:	CAN6Zwvv_h_C7rDwRAQL_Ufb3xrF261bV0XSeyFB3oZJDsUDLTw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi, previously I selected categorized data for update then updated counts
or inserted a new record if it was a new category of data.

select all categories
update batches of categories
or insert batches [intermingled as they hit batch size]

Problem was the select was saturating the network (pulling back far more
data than needed too)
So I switched to doing optimistic updates where I checked for 0 row updates
and made inserts out of them.

optimistic update batches
followed by insert batches

New problem massive table bloat. I'm losing gigabytes of disk an hour which
I can only recover by clustering.

Now's the bit where I lose some of my audience by saying I'm having this
bloat problem on 8.3.7 and 8.4.4 but not 9.0. I'd love to upgrade obviously
but that's out of my hands and I've been told not an option in the short
term.

My thoughts are: surely 0-row updates dont cause this or have impact on the
vacuum. I'm still doing the same updates after all why have things
degenerated so badly?
While it made sense to me that the dead tuples are now more in the middle
of the table than the end somehow and since autovacuum starts from the back
that might be the cause, but I've turned on full autovacuum logging and
there is seemingly very little vaccuming going on in either scenario (we
have a nightly scheduled cluster). In desperation I've also doubled the
freespace map settings in 8.3 to the seemingly very large max_fsm_pages =
25000000 and max_fsm_relations = 200000 without improvement.

Any suggestions? These are roughly 0.5 to 1TB databases with 8GB shared
buffers and work mem set appropriately and otherwise running fine.

cheers
Colin

Responses

Re: A Tale of 2 algorithms at 2012-10-02 04:54:38 from Craig Ringer

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Виктор Егоров	2012-10-01 23:09:44	Re: NestedLoops over BitmapScan question
Previous Message	pg noob	2012-10-01 16:49:53	deadlock_timeout affect on performance