From: | "Michaeldba(at)sqlexec(dot)com" <Michaeldba(at)sqlexec(dot)com> |
---|---|
To: | spiral <spiral(at)spiral(dot)sh> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Advice on best way to store a large amount of data in postgresql |
Date: | 2023-01-09 10:39:45 |
Message-ID: | 89974BAB-4B86-4D19-82CD-639EAB13D255@sqlexec.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
That’s crazy only having 8GB memory when you have tables with over 100GBs. One general rule of thumb is have enough memory to hold the biggest index.
Sent from my iPad
> On Jan 9, 2023, at 3:23 AM, spiral <spiral(at)spiral(dot)sh> wrote:
>
> Hello,
>
> We have a table containing ~1.75 billion rows, using 170GB storage.
> The table schema is the following:
>
> messages=# \d messages
> Table "public.messages"
> Column | Type | Collation | Nullable | Default
> --------------+---------+-----------+----------+---------
> mid | bigint | | not null |
> channel | bigint | | not null |
> member | integer | | |
> sender | bigint | | not null |
> original_mid | bigint | | |
> guild | bigint | | |
> Indexes:
> "messages_pkey" PRIMARY KEY, btree (mid)
>
>
> This table is used essentially as a key-value store; rows are accessed
> only with `mid` primary key. Additionally, inserted rows may only be
> deleted, but never updated.
>
> We only run the following queries:
> - INSERT INTO messages VALUES (...data...);
> - SELECT * FROM messages WHERE mid = $1;
> - DELETE FROM messages WHERE mid = $1;
> - DELETE FROM messages WHERE mid IN ($1...$n);
> - SELECT count(*) FROM messages;
>
> For the "IN" query, it is possible for there to be up to 100
> parameters, and it is possible that none of them will match an existing
> row.
>
> So, the problem: I don't know how to best store this data in
> postgres, or what system requirements would be needed.
> Originally, this table did not contain a substantial amount of data,
> and so I stored it in the same database as our CRUD user data. However,
> as the table became larger, cache was being allocated to (mostly
> unused) historical data from the `messages` table, and I decided to
> move the large table to its own postgres instance.
>
> At the same time, I partitioned the table, with TimescaleDB's automatic
> time-series partitioning, because our data is essentially time-series
> (`mid` values are Twitter-style snowflakes) and it was said that
> partitioning would improve performance.
> This ended up being a mistake... shared_buffers memory usage went way
> up, from the 20GB of the previous combined database to 28GB for just
> the messages database, and trying to lower shared_buffers at all made
> the database start throwing "out of shared memory" errors when running
> DELETE queries. A TimescaleDB update did improve this, but 28GB is way
> more memory than I can afford to allocate to this database - instead of
> "out of shared memory", it gets OOM killed by the system.
>
> What is the best course of action here?
> - Ideally, I would like to host this database on a machine with 4
> (Ryzen) cores, 8GB RAM, and tiered storage (our cloud provider doesn't
> support adding additional local storage to a VPS plan). Of course,
> this seems very unrealistic, so it's not a requirement, but the
> closer we can get to this, the better.
> - Is it a good idea to use table partitioning? I heard advice that one
> should partition tables with above a couple million rows, but I don't
> know how true this is. We have a table with ~6mil rows in our main
> database that has somewhat slow lookups, but we also have a table
> with ~13mil rows that has fast lookups, so I'm not sure.
>
> Thanks
> spiral
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2023-01-09 17:56:47 | Re: Advice on best way to store a large amount of data in postgresql |
Previous Message | spiral | 2023-01-08 12:02:01 | Advice on best way to store a large amount of data in postgresql |