Re: Multitenancy optimization

From: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multitenancy optimization
Date: 2019-03-29 08:06:31
Message-ID: CAK=1=WpT2OuoD7nNF=m436WQOE-62XfSFw4GUnGUOb++KJFAyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> Certainly it is possible to create multicolumn statistics to notify
> Postgres about columns correlation.
> But unfortunately it is not good and working solution.
>
> First of all we have to create multicolumn statistic for all possible
> combinations of table's attributes including "tenant_id".
> It is very inconvenient and inefficient.
>

On the inconvenient part: doesn't postgres itself automatically create
functional dependencies on combinations? i.e. it seems to me if we create
statistics on (a, b, c), then we don't need to create statistics on (a, b)
or (a, c) or (b, c), because the pg_statistic_ext entry for (a, b, c)
already includes enough information.

On the inefficient part, I think there's some areas of improvement here.
For example, if (product_id) -> seller_id correlation is 1.0, then
(product_id, product_name) -> seller_id correlation is definitely 1.0 and
we don't need to store it. So we can reduce the amount of information
stored in pg_statistic_ext -> stxdependencies, without losing any data
points.

More generally, if (a) -> b correlation is X, then (a, c) -> b correlation
is >= X. Maybe we can have a threshold to reduce number of entries in
pg_statistic_ext -> stxdependencies.

-- Hadi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nagaura, Ryohei 2019-03-29 08:11:45 RE: Timeout parameters
Previous Message Tsunakawa, Takayuki 2019-03-29 08:05:46 RE: Libpq support to connect to standby server as priority