Quick Links

Re: Should we warn against using too many partitions?

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Should we warn against using too many partitions?
Date:	2019-06-06 18:46:59
Message-ID:	CAKJS1f8WVtJEDaAyN2Wb50jVcy37mVk9bCCVrqw2fYA=F4tFjg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 6 Jun 2019 at 17:29, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >+
> >+ <sect2 id="ddl-partitioning-declarative-best-practices">
> >+ <title>Declarative Partitioning Best Practices</title>
> >+
> >+ <para>
> >+ The choice of how to partition a table should be considered carefully as
>
> Either say "How to partition consider should be .." or "The choice should MADE carefully" ?

I've changed "considered" to "made". I'm unable to make sense of the
first suggestion there :(

> >+ <para>
> >+ One of the most critical design decisions will be the column or columns
> >+ which you partition your data by. Often the best choice will be to
>
> by which ?

okay. I've moved the "by" from after "data" to before "which"

> >+ <para>
> >+ Choosing the number of partitions to divide the table into is also a
>
> the TARGET number of partitions BY WHICH to divide the table ?

Changed.

> >+ critical decision to make. Not having enough partitions may mean that
> >+ indexes remain too large and that data locality remains poor which could
> >+ result in poor cache hit ratios. However, dividing the table into too
> >+ many partitions can also cause issues. Too many partitions can mean
> >+ slower query planning times and higher memory consumption during both
> >+ query planning and execution. It's also important to consider what
> >+ changes may occur in the future when choosing how to partition your table.
> >+ For example, if you choose to have one partition per customer and you
> >+ currently have a small number of large customers, what will the
>
> have ONLY ?

I assume you mean after the "have" before "one partition per
customer"? I don't quite understand that since in the scenario we're
partitioning by customer, so it's not possible to have more than one
partition per customer, only the reverse is possible. It seems to me
injecting "only" there would just confuse things.

> >+ implications be if in several years you obtain a large number of small
> >+ customers. In this case, it may be better to choose to partition by
> >+ <literal>HASH</literal> and choose a reasonable amount of partitions
>
> reasonable NUMBER ?

changed.

> >+ <para>
> >+ It is also important to consider the overhead of partitioning during
> >+ query planning and execution. The query planner is generally able to
> >+ handle partition hierarchies up a few thousand partitions fairly well,
> >+ providing that the vast majority of them can be pruned during query
>
> provided ?
>
> I would say: "provided that typical queries prune all but a small number of
> partitions during planning time".

changed, only I used "during query planning" rather than "during planning time".

> >+ <command>DELETE</command> commands. Also, even if most queries are
> >+ able to prune a high number of partitions during query planning, it still
>
> LARGE number?

changed

> >+ may be undesirable to have a large number of partitions as each partition
>
> may still ?
>
> >+ also will obtain a relation cache entry in each session which uses the
>
> will require ? Or occupy ?

"require" seems better. Although, this may need to be reworded a bit
further per what Alvaro mentions.

> >+ <para>
> >+ With data warehouse type workloads it can make sense to use a larger
> >+ number of partitions than with an OLTP type workload. Generally, in data
> >+ warehouses, query planning time is less of a concern as the majority of
> >+ processing time is generally spent during query execution. With either of
>
> remove the 2nd "generally"

Oops. I should have caught that.

> >+ these two types of workload, it is important to make the right decisions
> >+ early as re-partitioning large quantities of data can be painstakingly
>
> early COMMA ?

removed

> PAINFULLY slow

yeah

> >+ When performance is critical, performing workload simulations to
> >+ assist in making the correct decisions can be beneficial.
>
> I would say:
> Simulations of the intended workload are beneficial for optimizing partitioning
> strategy.

I took that but added "often" before "beneficial"

I'll write the patches for PG10 and PG11 and send them all a bit later.

Thanks for the review.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Should we warn against using too many partitions? at 2019-06-06 05:29:02 from Justin Pryzby

Responses

Re: Should we warn against using too many partitions? at 2019-06-06 18:54:55 from Justin Pryzby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Justin Pryzby	2019-06-06 18:54:55	Re: Should we warn against using too many partitions?
Previous Message	Fabien COELHO	2019-06-06 18:35:56	Re: LLVM compile failing in seawasp