From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, MingJu Wu <mingjuwu0505(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Partial index creation always scans the entire table |
Date: | 2020-02-16 16:35:43 |
Message-ID: | 23936.1581870943@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> I was reminded of reading this, but I think it's a pretty different case.
> https://heap.io/blog/engineering/running-10-million-postgresql-indexes-in-production
Yeah, the critical paragraph in that is
This isn’t as scary as it sounds for a two main reasons. First, we
shard all of our data by customer. Each table in our database holds
only one customer’s data, so each table has a only a few thousand
indexes at most. Second, these events are relatively rare. The most
common defined events make up only a few percent of a customer’s raw
events, and most are much more rare. This means that we perform
relatively little I/O maintaining this schema, because most incoming
events match no event definitions and therefore don’t need to be
written to any of the indexes. Similarly, the indexes don’t take up
much space on disk.
A set of partial indexes that cover a small part of the total data
can be sensible. If you're trying to cover most/all of the data,
you're doing it wrong --- basically, you're reinventing partitioning
using the wrong tools.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Lars Aksel Opsahl | 2020-02-16 17:15:25 | SubtransControlLock and performance problems |
Previous Message | Justin Pryzby | 2020-02-16 15:59:19 | Re: Partial index creation always scans the entire table |