Re: Advice on best way to store a large amount of data in postgresql

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: spiral <spiral(at)spiral(dot)sh>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Advice on best way to store a large amount of data in postgresql
Date: 2023-01-09 17:56:47
Message-ID: 20230109175647.GB9837@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Jan 08, 2023 at 07:02:01AM -0500, spiral wrote:
> This table is used essentially as a key-value store; rows are accessed
> only with `mid` primary key. Additionally, inserted rows may only be
> deleted, but never updated.
>
> We only run the following queries:
> - INSERT INTO messages VALUES (...data...);
> - SELECT * FROM messages WHERE mid = $1;
> - DELETE FROM messages WHERE mid = $1;
> - DELETE FROM messages WHERE mid IN ($1...$n);
> - SELECT count(*) FROM messages;

Great - it's good to start with the queries to optimize.

Are you using the extended query protocol with "bind" parameters, or are they
escaped and substituted by the client library ?

> So, the problem: I don't know how to best store this data in
> postgres, or what system requirements would be needed.
> Originally, this table did not contain a substantial amount of data,
> and so I stored it in the same database as our CRUD user data. However,
> as the table became larger, cache was being allocated to (mostly
> unused) historical data from the `messages` table, and I decided to
> move the large table to its own postgres instance.
>
> At the same time, I partitioned the table, with TimescaleDB's automatic
> time-series partitioning, because our data is essentially time-series
> (`mid` values are Twitter-style snowflakes) and it was said that
> partitioning would improve performance.
> This ended up being a mistake... shared_buffers memory usage went way
> up, from the 20GB of the previous combined database to 28GB for just
> the messages database, and trying to lower shared_buffers at all made
> the database start throwing "out of shared memory" errors when running
> DELETE queries. A TimescaleDB update did improve this, but 28GB is way
> more memory than I can afford to allocate to this database - instead of
> "out of shared memory", it gets OOM killed by the system.

Can you avoid using DELETE and instead use DROP ? I mean, can you
arrange your partitioning such that the things to be dropped are all in
one partition, to handle in bulk ? That's one of the main reasons for
using partitioning.

(Or, as a worse option, if you need to use DELETE, can you change the
query to DELETE one MID at a time, and loop over MIDs?)

What version of postgres is it ? Ah, I found that you reported the same thing
at least one other place. (It'd be useful to include here that information as
well as the prior discussion with other product/vendor).

https://github.com/timescale/timescaledb/issues/5075

In this other issue report, you said that you increased
max_locks_per_transaction. I suppose you need to increase it further,
or decrease your chunk size. How many "partitions" do you have
(actually, timescale uses inheritance) ?

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Samed YILDIRIM 2023-01-09 23:14:51 Re: Advice on best way to store a large amount of data in postgresql
Previous Message Michaeldba@sqlexec.com 2023-01-09 10:39:45 Re: Advice on best way to store a large amount of data in postgresql