Re: Using Postgres to store high volume streams of sensor readings

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Ciprian Dorin Craciun <ciprian(dot)craciun(at)gmail(dot)com>, Shane Ambler <pgsql(at)sheeky(dot)biz>, Diego Schulz <dschulz(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Using Postgres to store high volume streams of sensor readings
Date: 2008-11-22 23:02:39
Message-ID: 16193.1227394959@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> The problem is, most likely, on updating the indexes. Heap inserts
> should always take more or less the same time, but index insertion
> requires walking down the index struct for each insert, and the path to
> walk gets larger the more data you have.

It's worse than that: his test case inserts randomly ordered keys, which
means that there's no locality of access during the index updates. Once
the indexes get bigger than RAM, update speed goes into the toilet,
because the working set of index pages that need to be touched also
is bigger than RAM. That effect is going to be present in *any*
standard-design database, not just Postgres.

It's possible that performance in a real-world situation would be
better, if the incoming data stream isn't so random; but it's
hard to tell about that with the given facts.

One possibly useful trick is to partition the data by timestamp with
partition sizes chosen so that the indexes don't get out of hand.
But the partition management might be enough of a PITA to negate
any win.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-11-22 23:03:44 Re: date stamp on update?
Previous Message Gregory Stark 2008-11-22 22:44:20 Re: Postgres mail list traffic over time