Re: [GENERAL] 4B row limit for CLOB tables

From: Matthew Kelly <mkelly(at)tripadvisor(dot)com>
To: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 03:12:11
Message-ID: 5696A620-74A7-4C4D-937B-A4D5695CC9BE@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap page) is 8796093022208 (~9e13) bytes
> ... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large. At TripAdvisor we’ve been using a NoSQL solution to do session storage. We are looking to probably swap that out to be Postgres (every other db backing the site is Postgres). Essentially, what I’m building is a system with 1 logical table that maps session id to a 2KB+ grab bag of ever changing session attributes which is partially normalized, partially json. 315 million uniques a month multiplied by the retention policy means I need to hold 2-4 billion session objects (and somehow expire old ones). Additionally, most http calls can update the session, so between maintenance windows I expect to take around 20 billion 'upserts’. Obviously, I will have to shard and partition the table in practice, but this weekend I ran a test that demonstrated that a single table on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. Well, it can for 38 hours… until you wrap xid’s on the toast table. :P I’ll be the first to admit that isn’t the normal use case though. I’m happy to have found this thread, however, because I’m going to have to build around the global oid counter, explicitly the prevent the problem I explain below regarding clustering.

> Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared counter in conjunction with pg_restore/clustering that is actually pretty concerning.

Just checked through all of TripAdvisor’s normal databases and the max tuples I see in single toast table is 17,000,000, so that is still a couple of orders of magnitude too small. (however, close enough that it’ll be a concern in a few years).

However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids.

I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potential problem.

What even more weird is that this issue can be trigged by consuming too many oid’s in a different database in the same cluster (i.e. creating large amounts of temp tables)

> The problem with changing the id from 32 to 64 bits is that the storage *for everybody else* doubles, making the implementation slower for most.... though this might be actually not that important.

Well, you aren’t doubling the storage. Even if you have to store the key in 4 places, you are adding 16 bytes per TOAST tuple. If we work off the 2KB estimate for each TOAST tuple, then you are only increasing the storage by 0.7%. I’m sure there are more hidden costs but we are really only talking about a low single digit percent increase. In exchange, you get to drop one index scan per toast insert; an index scan looking in the only hot part of the index.

That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs.

- Matt K

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jim Nasby 2015-02-03 03:13:16 Re: oracle to postgres
Previous Message Jim Nasby 2015-02-03 03:10:24 Re: Question on session_replication_role

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-02-03 03:59:00 Missing markup in pg_receivexlog.sgml
Previous Message Jim Nasby 2015-02-03 02:54:02 Re: Release note bloat is getting out of hand