Re: How would you store read/unread topic status?

From: Mike <ipso(at)snappymail(dot)ca>
To: Mathieu Nebra <mateo21(at)siteduzero(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How would you store read/unread topic status?
Date: 2009-06-23 16:25:15
Message-ID: 20090623092515.1f1d74b3@ipso.snappymail.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described.

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive.

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update <table> set LastReadAnswerID = <value> where UserID = <value>
AND TopicID = <value>

You could do this instead:

insert into <table> VALUES(<user_id>,<topic_id>,<last_read_answer_id>)

Then just modify your select statement slightly to get the last
inserted row:

select * from <table> where user_id = <value> AND topic_id = <value>
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too.

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.

On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra <mateo21(at)siteduzero(dot)com> wrote:

> Robert Haas a écrit :
> >>>> Which pg version are you using?
> >> I should have mentionned that before sorry: PostgreSQL 8.2
> >
> > I think there is an awful lot of speculation on this thread about
> > what your problem is without anywhere near enough investigation. A
> > couple of seconds for an update is a really long time, unless your
> > server is absolutely slammed, in which case probably everything is
> > taking a long time. We need to get some more information on what
> > is happening here.
>
> You're right, I'll give you the information you need.
>
> > Approximately how many requests per second are you servicing?
> > Also,
>
> How can I extract this information from the database? I know how to
> use pg_stat_user_tables. My table has:
>
> seq_tup_read
> 133793491714
>
> idx_scan
> 12408612540
>
> idx_tup_fetch
> 41041660903
>
> n_tup_ins
> 14700038
>
> n_tup_upd
> 6698236
>
> n_tup_del
> 15990670
>
> > can you:
> >
> > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
> > the exact query and the output.
>
> "Index Scan using prj_frm_flg_pkey on prj_frm_flg (cost=0.00..8.58
> rows=1 width=18)"
> " Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"
>
> This time it only took 54ms, but maybe it's already a lot.
>
>
> >
> > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
> > so of the output.
>
> It's not very long, I can give you the whole log:
>
> INFO: vacuuming "public.prj_frm_flg"INFO: scanned index
> "prj_frm_flg_pkey" to remove 74091 row versions
> DETAIL: CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO: scanned index
> "flg_fav" to remove 74091 row versions
> DETAIL: CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO: scanned index
> "flg_notif" to remove 74091 row versions
> DETAIL: CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO: scanned index
> "flg_post" to remove 74091 row versions
> DETAIL: CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO: scanned index
> "flg_no_inter" to remove 74091 row versions
> DETAIL: CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO: "prj_frm_flg":
> removed 74091 row versions in 5979 pages
> DETAIL: CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO: index
> "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
> DETAIL: 63153 index row versions were removed.
> 672 index pages have been deleted, 639 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_fav" now
> contains 1315895 row versions in 18228 pages
> DETAIL: 73628 index row versions were removed.
> 21 index pages have been deleted, 16 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_notif" now
> contains 1315895 row versions in 18179 pages
> DETAIL: 73468 index row versions were removed.
> 22 index pages have been deleted, 13 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_post" now
> contains 1315895 row versions in 18194 pages
> DETAIL: 73628 index row versions were removed.
> 30 index pages have been deleted, 23 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "flg_no_inter" now
> contains 1315895 row versions in 8596 pages
> DETAIL: 73628 index row versions were removed.
> 13 index pages have been deleted, 8 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: "prj_frm_flg": found 74091
> removable, 1315895 nonremovable row versions in 10485 pages
> DETAIL: 326 dead row versions cannot be removed yet.
> There were 253639 unused item pointers.
> 10431 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 1.91s/2.28u sec elapsed 542.75 sec.
>
> Total: 542877 ms.
>
> >
> > 3. Try your UPDATE statement at a low-traffic time of day and see
> > whether it's faster than it is at a high-traffic time of day, and by
> > how much. Or dump your database and reload it on a dev server and
> > see how fast it runs there.
>
> It took 4ms.
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-06-23 17:05:22 Re: How would you store read/unread topic status?
Previous Message Mike 2009-06-23 16:23:22 Re: How would you store read/unread topic status?