Re: How would you store read/unread topic status?

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Mathieu Nebra <mateo21(at)siteduzero(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: How would you store read/unread topic status?
Date: 2009-06-23 17:17:36
Message-ID: C6665C40.8811%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6/23/09 7:54 AM, "Mathieu Nebra" <mateo21(at)siteduzero(dot)com> wrote:

>> On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
>>>>> I'm running a quite large website which has its own forums. They are
>>>>> currently heavily used and I'm getting performance issues. Most of
> them
>>>>> are due to repeated UPDATE queries on a "flags" table.
>>>>>
>>>>> This "flags" table has more or less the following fields:
>>>>>
>>>>> UserID - TopicID - LastReadAnswerID
>>>>>
>>>>> The flags table keeps track of every topic a member has visited and
>>>>> remembers the last answer which was posted at this moment. It
> allows the
>>>>> user to come back a few days after and immediately jump to the last
>>>>> answer he has not read.
>>>>> My problem is that everytime a user READS a topic, it UPDATES this
> flags
>>>>> table to remember he has read it. This leads to multiple updates
> at the
>>>>> same time on the same table, and an update can take a few seconds.
> This
>>>>> is not acceptable for my users.
>>> Have you analyzed why it takes that long? Determining that is the first
>>> step of improving the current situation...
>>>
>>> My first guess would be, that your disks cannot keep up with the number
>>> of syncronous writes/second. Do you know how many transactions with
>>> write access you have? Guessing from your description you do at least
>>> one write for every page hit on your forum.
>
> I don't know how many writes/s Pgsql can handle on my server, but I
> first suspected that it was good practice to avoid unnecessary writes.
>
> I do 1 write/page for every connected user on the forums.
> I do the same on another part of my website to increment the number of
> page views (this was not part of my initial question but it is very close).
>
>>>
>>> With the default settings every transaction needs to wait for io at the
>>> end - to ensure transactional semantics.
>>> Depending on your disk the number of possible writes/second is quite low
>>> - a normal SATA disk with 7200rpm can satisfy something around 130
>>> syncronous writes per second. Which is the upper limit on writing
>>> transactions per second.
>>> What disks do you have?
>
> We have 2 SAS RAID 0 15000rpm disks.
>
>>>
>>> On which OS are you? If you are on linux you could use iostat to get
>>> some relevant statistics like:
>>> iostat -x /path/to/device/the/database/resides/on 2 10
>>>
>>> That gives you 10 statistics over periods of 2 seconds.
>>>
>>>
>>> Depending on those results there are numerous solutions to that
> problem...
>
> Here it is:
>
> $ iostat -x /dev/sda 2 10
> Linux 2.6.18-6-amd64 (scratchy) 23.06.2009
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 18,02 0,00 12,87 13,13 0,00 55,98
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,94 328,98 29,62 103,06 736,58 6091,14 51,46
> 0,04 0,25 0,04 0,51
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 39,65 0,00 48,38 2,00 0,00 9,98
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,00 0,00 10,00 78,00 516,00 1928,00 27,77
> 6,44 73,20 2,75 24,20
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 40,15 0,00 48,13 2,24 0,00 9,48
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,00 0,00 6,47 100,50 585,07 2288,56 26,87
> 13,00 121,56 3,00 32,04
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 45,14 0,00 45,64 6,73 0,00 2,49
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 1,00 0,00 34,00 157,50 1232,00 3904,00 26,82
> 26,64 139,09 3,03 58,00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 46,25 0,00 49,25 3,50 0,00 1,00
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,00 0,00 27,00 173,00 884,00 4224,00 25,54
> 24,46 122,32 3,00 60,00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 44,42 0,00 47,64 2,23 0,00 5,71
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,00 0,00 15,42 140,30 700,50 3275,62 25,53
> 17,94 115,21 2,81 43,78
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 41,75 0,00 48,50 2,50 0,00 7,25
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,50 0,00 21,11 116,08 888,44 2472,36 24,50
> 12,62 91,99 2,55 34,97
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 44,03 0,00 46,27 2,99 0,00 6,72
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 9,00 0,00 10,00 119,00 484,00 2728,00 24,90
> 15,15 117,47 2,70 34,80
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 36,91 0,00 51,37 2,49 0,00 9,23
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,99 0,00 14,78 136,45 390,15 2825,62 21,26
> 21,86 144,52 2,58 39,01
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 38,75 0,00 48,75 1,00 0,00 11,50
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0,00 0,00 7,54 67,34 377,89 1764,82 28,62
> 5,38 71,89 2,95 22,11
>

I see a lot of io wait time there. My guess is that your DB is flooded with
synchronous writes.

IF you want to optimize the hardware for this you have a couple options.
I'm assuming your RAID 0 is not hardware RAID.

1. Use 8.3+ and asynchronous commit (set synchronous_commit=false). This
is safe data wise, but if your DB crashes you might lose the last second of
transactions or so that the app thought were comitted. For a DB forum, this
is probably very acceptable. Performance should significantly gain as the
writes/sec will go down a lot.

2. put your data on one partition and your WAL log on another.

3. Get a battery backed hardware raid with write-back caching.

4. If you are using ext3 on linux, make sure you mount with data=writeback
on the file system that your wal logs are on. data=ordered will cause the
WHOLE file sytem to be flushed for each fsync, not just the tiny bit of WAL
log.

In short, if you combined 1,2, and 4, you'll probably have significantly
more capacity on the same server. So make sure your WAL log is in a
different file system from your OS and data, mount it optimally, and
consider turning synchronous_commit off.

If you're using RAID 0, I doubt the data is so precious that
synchronous_commit being true is important at all.

>
>
>>>
>>>>> Question: what is the general rule of thumb here? How would you store
>>>>> this information?
>>> The problem here is, that every read access writes to disk - that is not
>>> going to scale very well.
>
> That's what I thought.
>
>>> One possible solution is to use something like memcached to store the
>>> last read post in memory and periodically write it into the database.
>>>
>
> We're starting using memcached. But how would you "periodically" write
> that to database?
>
>>>
>>> Which pg version are you using?
>
> I should have mentionned that before sorry: PostgreSQL 8.2
>
> Thanks a lot!
>
>
>
> Andres Freund a écrit :

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mike Ivanov 2009-06-23 19:39:55 Implications of having large number of users
Previous Message Scott Carey 2009-06-23 17:05:22 Re: How would you store read/unread topic status?