Re: Partial index creation always scans the entire table

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, MingJu Wu <mingjuwu0505(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Partial index creation always scans the entire table
Date: 2020-02-16 16:35:43
Message-ID: 23936.1581870943@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> I was reminded of reading this, but I think it's a pretty different case.
> https://heap.io/blog/engineering/running-10-million-postgresql-indexes-in-production

Yeah, the critical paragraph in that is

This isn’t as scary as it sounds for a two main reasons. First, we
shard all of our data by customer. Each table in our database holds
only one customer’s data, so each table has a only a few thousand
indexes at most. Second, these events are relatively rare. The most
common defined events make up only a few percent of a customer’s raw
events, and most are much more rare. This means that we perform
relatively little I/O maintaining this schema, because most incoming
events match no event definitions and therefore don’t need to be
written to any of the indexes. Similarly, the indexes don’t take up
much space on disk.

A set of partial indexes that cover a small part of the total data
can be sensible. If you're trying to cover most/all of the data,
you're doing it wrong --- basically, you're reinventing partitioning
using the wrong tools.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Lars Aksel Opsahl 2020-02-16 17:15:25 SubtransControlLock and performance problems
Previous Message Justin Pryzby 2020-02-16 15:59:19 Re: Partial index creation always scans the entire table