Re: Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE
Date: 2025-01-03 13:45:21
Message-ID: 24ed07ad-e857-47a8-9477-49fc19fb89c9@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10.12.2024 16:32, Ilia Evdokimov wrote:
>
> On 09.12.2024 16:10, Ilia Evdokimov wrote:
>> Hi hackers,
>>
>> The repeated use of the number 300 in the ANALYZE-related code
>> creates redundancy and relies on scattered, sometimes unclear,
>> comments to explain its purpose. This can make the code harder to
>> understand, especially for new contributors who might not immediately
>> understand its significance. To address this, I propose introducing a
>> macro STATS_MIN_ROWS to represent this value and consolidating its
>> explanation in a single place, making the code more consistent and
>> readable.
>>
>> --
>> Best regards,
>> Ilia Evdokimov,
>> Tantor Labs LLC.
>
>
> Hi everyone,
>
> Currently, the value 300 is used as the basis for determining the
> number of rows sampled during ANALYZE, both for single-column and
> extended statistics. While this value has a well-established rationale
> for single-column statistics, its suitability for extended statistics
> remains uncertain, as no specific research has confirmed that this is
> an optimal choice for them. To better reflect this distinction, I
> propose introducing two macros: STATS_MIN_ROWS for single-column
> statistics and EXT_STATS_MIN_ROWS for extended statistics.
>
> This change separates the concerns of single-column and extended
> statistics sampling, making the code more explicit and easier to adapt
> if future research suggests a different approach for extended
> statistics. The values remain the same for now, but the introduction
> of distinct macros improves clarity and prepares the codebase for
> potential refinements.
>
> Does this seem like a reasonable approach to handling these differences?
>
> --
> Best regards,
> Ilia Evdokimov,
> Tantor Labs LLC.

Hi everyone,

In my opinion, it is more appropriate to define||EXT_STATS_MIN_ROWS as
STATS_MIN_ROWS. I also reverted some of the code comments and rewrote
others. I attached patch.

Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v3-0001-Define-macros-for-minimum-rows-of-stats.patch text/x-patch 7.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilia Evdokimov 2025-01-03 14:09:02 Remove unused rel parameter in lookup_var_attr_stats
Previous Message Robert Haas 2025-01-03 13:44:22 Re: magical eref alias names