Re: UPDATEDs slowing SELECTs in a fully cached database

From: lars <lhofhansl(at)yahoo(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database
Date: 2011-07-12 00:09:19
Message-ID: 4E1B90AF.1020408@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 07/11/2011 04:02 PM, lars wrote:
> On 07/11/2011 02:43 PM, Merlin Moncure wrote:
>> On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>> lars<lhofhansl(at)yahoo(dot)com> wrote:
>>>
>>>> Stopping the UPDATEs, waiting for any CHECKPOINTs to finish,
>>>> and then running the SELECTs indeed shows a similar slowdown.
>>>>
>>>> Interestingly I see very heavy WAL traffic while executing the
>>>> SELECTs. (So I was confused as to what caused the WAL traffic).
>>> Hint bit changes aren't logged, so if it was that you would be
>>> seeing writes to the heap, but not to the WAL. Clean-up of dead
>>> tuples is logged -- this is probably the result of pruning dead
>>> tuples. You could probably reduce the impact on your SELECT
>>> statements at least a little by making autovacuum more aggressive.
>> yeah. In fact, I'd like to disable autovacuum completely just to
>> confirm this. In particular I'd like to know if that removes wal
>> traffic when only selects are going on. Another way to check is to
>> throw some queries to pg_stat_activity during your select period and
>> see if any non-select activity (like autovacum vacuum). Basically I'm
>> suspicious there is more to this story.
>>
>> hint bit flusing causing i/o during SELECT is a typical complaint
>> (especially on systems with underperformant i/o), but I'm suspicious
>> if that's really the problem here. Since you are on a virtualized
>> platform, I can't help but wonder if you are running into some
>> bottleneck that you wouldn't see on native hardware.
>>
>> What's iowait during the slow period?
>>
>> merlin
> Thanks Kevin and Merlin this is extremely helpful...
>
> Ok, that makes much more sense (WALing hint bits did not make sense).
>
> I disabled auto-vacuum and did four tests:
> 1. Run a bunch of updates, stop that process, wait until checkpointing
> is finished, and run the selects (as before).
> 2. run VACUUM manually, then run the SELECTs
> 3. Have the UPDATEs and SELECTs touch a mutually exclusive, random
> sets of row (still in sets of 10000).
> So the SELECTs are guaranteed not to select rows that were updated.
> 4. Lastly, change the UPDATEs to update a non-indexed column. To rule
> out Index maintenance. Still distinct set of rows.
>
> In the first case I see the same slowdown (from ~14ms to ~400-500ms).
> pg_stat_activity shows no other load during that
> time. I also see write activity only on the WAL volume.
>
> In the 2nd case after VACUUM is finished the time is back to 14ms. As
> an aside: If I run the SELECTs while VACUUM is
> running the slowdown is about the same as in the first case until
> (apparently) VACUUM has cleaned up most of the table,
> at which point the SELECTs become faster again (~50ms).
>
> In the 3rd case I see exactly the same behavior, which is interesting.
> Both before VACUUM is run and after.
> There's no guarantee obviously that distinct rows do not share the
> same page of course especially since the index is
> updated as part of this (but see the 4th case).
>
> In case 4 I still see the same issue. Again both before and after VACUUM.
>
> In all cases I see from pg_stat_bgwriter that no backend writes
> buffers directly (but I think that only pertains to dirty buffers, and
> not the WAL).
>
> So I think I have a partial answer to my initial question.
>
> However, that brings me to some other questions:
> Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)?
> That even happens when the updates did not touch the selected rows(?)
> And why does that slow down the SELECTs? (checkpointing activity on
> the EBS volume holding the database for example
> does not slow down SELECTs at all, only WAL activity does). Does the
> selecting backend do that work itself?
>
> Lastly, is this documented somewhere? (I apologize if it is and I
> missed it). If not I'd be happy to write a wiki entry for this.
>
>
Oh, and iowait hovers around 20% when SELECTs are slow:

avg-cpu: %user %nice %system %iowait %steal %idle
1.54 0.00 0.98 18.49 0.07 78.92

When SELECTs are fast it looks like this:
avg-cpu: %user %nice %system %iowait %steal %idle
8.72 0.00 0.26 0.00 0.00 91.01

Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message lars 2011-07-12 04:42:45 Re: UPDATEDs slowing SELECTs in a fully cached database
Previous Message lars 2011-07-11 23:02:37 Re: UPDATEDs slowing SELECTs in a fully cached database