Re: General performance/load issue

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Gaëtan Allart" <gaetan(at)nexylan(dot)com>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: General performance/load issue
Date: 2011-11-26 14:12:05
Message-ID: ffa9ad34a8bd6ea2c7869e3e305d458a.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
> A better view of iotop :
>
> TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
> 31875 be/4 postgres 0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres:
> database database 46.105.104.205(50228) SELECT
> 30985 be/4 postgres 0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres:
> database database 46.105.104.205(47672) SELECT
>
> As you can see, SELECTS are writing very much on the disk.
>
> At this moment, pg_stat_activity show many many queries running at the
> same time.
> It looks like that all queries are suddenly writing on disk and not a
> particular one, making me think of a buffer issue or something.

No, I don't think this is an issue with the size of shared buffers. That'd
influence the pg_stat_bgwriter - the buffers_backend would grow much
faster, and it's not the case.

So I'm guessing it's either hint bits or sorting. If I had to, I'd
probably guess about the hint bits - see for example this thread

http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php

it more or less behaves like what you described so far.

> This is a example of 10 MB/s writing query :
>
> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
> "table"."url_article" AS "url_article", "table"."original_url" AS
> "original_url", "table"."name" AS "name", "table"."description" AS
> "description", "table"."content" AS "content", "table"."permis" AS
> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
> "add_date", "table"."comments_open" AS "comments_open", "table"."site_id"
> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
> "table"."score" AS "score", "arti

Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
it).

> Checkpoints logs still show very long write times :
>
> LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
> total=63.625 s

No, that's fine - that's what spread checkpoints do. Once the
checkpoint_timeout expires, the system decides to perform a checkpoint,
i.e. it has to write all dirty (modified) shared buffers to the disk. But
it knows the next checkpoint will happen in checkpoint_timeout, so it has
about 5 minutes to write all the data.

So it says something like 'I need to write 540MB want to write that in 270
seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
write phase is expected to take long. But it has to sync the data at the
end, and that's where the problems usually happen - so the important thing
is 'sync' and that improved significantly. 6 seconds is not great but it's
not a big issue I guess.

> Couldn't this be a hardware issue ?

I don't think so. The problem is that the queries write a lot of data, and
the SSD can't help in this case. The SSDs provide much better random
performance (compared to spinners), but the sequential performance is not
much better. So pushing 70MB/s to the SSD may be fully utilized.

Have you tested the basic performance (like bonnie++, dd etc) before going
into production? There are some common SSD-related mistakes (e.g. not
aligning the partitions properly) that may easily cause 30% performance
loss. But this won't fix the problem, although you could get a bit better
performance.

Tomas

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robert Treat 2011-11-26 14:25:00 Re: General performance/load issue
Previous Message Gaëtan Allart 2011-11-26 09:45:38 Re: General performance/load issue