Re: typical active table count?

From: Jeremy Schneider <schneider(at)ardentperf(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: typical active table count?
Date: 2023-06-27 22:00:35
Message-ID: 03a01fb1-e0be-24f2-eadb-8e1a4b4981bf@ardentperf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 6/27/23 12:08 PM, Ron wrote:
> On 6/27/23 13:47, Jeremy Schneider wrote:
>> On 6/27/23 9:32 AM, Ben Chobot wrote:
>>> We certainly have databases where far more than 100 tables are updated
>>> within a 10 second period. Is there a specific concern you have?
>>>
>> Thank Ben, not a concern but I'm trying to better understand how common
>> this might be. And I think sharing general statistics about how people
>> use PostgreSQL is a great help to the developers who build and maintain it.
>>
>> One really nice thing about PostgreSQL is that two quick copies of
>> pg_stat_all_tables and you can easily see this sort of info.
>>
>> If you have a database where more than 100 tables are updated within a
>> 10 second period - this seems really uncommon to me - I'm very curious
>> about the workload.
>
> 100 tables updates just means /possibly complicated schema/, not
> necessarily high volume.
>
> ...
>
> And honestly, 100 tables in 10 seconds is 10 tables/second.  If each
> gets one insert, that's a laughably slow transaction rate.  (Unless of
> course there's 85 indices per table, and foreign keys don't have
> supporting indices.)

I don't think the math actually works this way on highly concurrent
systems. In fact, this morning I connected with a coworker who works on
Amazon fulfillment center DBs and there was almost no difference in the
number of tables with insert/update/delete regardless of whether you
looked at a 10 second window or a 2 second window. I was also able to
chat with another coworker at Amazon who got numbers from a couple of
their PG databases, and connected w one person on slack at a different
company who passed along numbers, and got a few emails from Oracle folks.

The numbers reported back to me ranged from 29 to over a hundred.
Obviously there are also lots of small databases behind wordpress
websites with much less activity, but I found this to be an interesting
measure of some respectably busy systems.

The original context was a conversation related to logical replication
of DB changes.

But then I got interested in the general question and topic - and
someone on the Oracle side mentioned system tables which is a really
good point that hadn't occurred to me yet. The original conversation was
concerned with user tables and not system ones, but there would be a
fair amount of ongoing system table activity too.

Besides partitioning, another interesting dimension of the conversation
has been thinking about different categories of workloads. For example:
SaaS or multitenant applications with many copies of a similar schema,
ISVs, ERPs, or large enterprise databases with lots of development
history. All of these categories can easily ramp up the counts.

I'm still interested in more data - if anyone reading this can grab a
couple snapshots of pg_stat_all_tables and report back numbers for a 10
second window and a 2 second window, that would be amazing!

-Jeremy

--
http://about.me/jeremy_schneider

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Atul Kumar 2023-06-27 23:14:32 connect postgres using url
Previous Message Marc Millas 2023-06-27 20:58:21 Re: pb with join plan