From: | "Ciprian Dorin Craciun" <ciprian(dot)craciun(at)gmail(dot)com> |
---|---|
To: | "Greg Smith" <gsmith(at)gregsmith(dot)com> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Using Postgres to store high volume streams of sensor readings |
Date: | 2008-11-21 17:58:29 |
Message-ID: | 8e04b5820811210958o1f7f34e9k959a832adbd60095@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Nov 21, 2008 at 7:45 PM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> On Fri, 21 Nov 2008, Tom Lane wrote:
>
>> Not sure if it applies to your real use-case, but if you can try doing
>> the COPY from a local file instead of across the network link, it
>> might go faster.
>
> The fact that the inserts are reported as fast initially but slow as the
> table and index size grow means it's probably a disk bottleneck rather than
> anything related to the client itself. If the network overhead was the
> issue, I wouldn't expect it to start fast like that. Ditto for concerns
> about the random function being slow. Either of those might speed up the
> initial, fast period a bit, but they're not going to impact the later
> slowdown.
>
> Ciprian, the main interesting piece of data to collect here is a snapshot of
> a few samples lines from the output from "vmstat 1" during the initial, fast
> loading section versus the slower period. I think what's happening to you
> is that maintaining the index blocks on the disk is turning into
> increasingly random I/O as the size of the table grows, and your disks just
> can't keep up with that. What I'd expect is that initially the waiting for
> I/O "wa" figure will be low, but it will creep up constantly and at some
> point spike up hard after the working set of data operated on exceeds
> memory.
Ok. Silly question: how do I "vmstat 1"???
The problem is indeed the indexes... So If we analyze the insert
patterns: client id's randomly distributed and sensor id's the same,
and the index is created ontop of these two, it means that
(probabilistically) speaking after 100 thousand inserts (10 thousand
clients and 10 sensors), all the index pages would be dirty...
Indeed I could prolongue the flush by using bigger and bigger
memory, but this doesn't help for 100 million records...
> The fact that PostgreSQL performs badly here compared to the more
> lightweight databases you've used isn't that much of a surprise. There's a
> fair amount of overhead for the write-ahead log and the MVCC implementation
> in the database, and your application is suffering from all that robustness
> overhead but not really gaining much of a benefit from it. The main things
> that help in this sort of situation are increases in shared_buffers and
> checkpoint_segments, so that more database information is stored in RAM for
> longer rather than being pushed to disk too quickly, but what Rafael
> suggested already got you most of the possible improvement here. You might
> get an extra bit of boost by adjusting the index FILLFACTOR upwards (default
> is 90, if you're never updating you could try 100). I doubt that will be
> anything but a minor incremental improvement though.
About the fillfactor, on the contrary, I think I should set it
lower (as the index fills very quickly)... I've set it to 10% and it
behave a little better than with (10)... I'll run a full 100 million
test to see where it breaks...
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
Thanks,
Ciprian Craciun.
From | Date | Subject | |
---|---|---|---|
Next Message | paulo matadr | 2008-11-21 18:01:53 | Res: converter pgplsql funcion |
Previous Message | Richard Huxton | 2008-11-21 17:48:21 | Re: Postgres mail list traffic over time |