Re: Collecting statistics about contents of JSONB columns

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Mahendra Thalor <mahendra(dot)thalor(at)enterprisedb(dot)com>, Oleg Bartunov <obartunov(at)postgrespro(dot)ru>
Subject: Re: Collecting statistics about contents of JSONB columns
Date: 2022-01-05 20:22:31
Message-ID: CANbhV-FgzKvB5Drz9R8yprC2=O156mYmncLdsRxE8CxyYzadJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 31 Dec 2021 at 22:07, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

> The patch does something far more
> elegant - it simply uses stavalues to store an array of JSONB documents,
> each describing stats for one path extracted from the sampled documents.

Sounds good.

> I'm sure there's plenty open questions - for example I think we'll need
> some logic to decide which paths to keep, otherwise the statistics can
> get quite big, if we're dealing with large / variable documents. We're
> already doing something similar for MCV lists.
>
> One of Nikita's patches not included in this thread allow "selective"
> statistics, where you can define in advance a "filter" restricting which
> parts are considered interesting by ANALYZE. That's interesting, but I
> think we need some simple MCV-like heuristics first anyway.
>
> Another open question is how deep the stats should be. Imagine documents
> like this:
>
> {"a" : {"b" : {"c" : {"d" : ...}}}}
>
> The current patch build stats for all possible paths:
>
> "a"
> "a.b"
> "a.b.c"
> "a.b.c.d"
>
> and of course many of the paths will often have JSONB documents as
> values, not simple scalar values. I wonder if we should limit the depth
> somehow, and maybe build stats only for scalar values.

The user interface for designing filters sounds hard, so I'd hope we
can ignore that for now.

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-01-05 20:56:26 Re: biblio.sgml dead link
Previous Message Simon Riggs 2022-01-05 20:19:30 Re: Logical insert/update/delete WAL records for custom table AMs