Re: Multi-Column List Partitioning

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-08-27 07:24:09
Message-ID: CAMm1aWb3_4ZH7cqCp5LGRas3p18qg+U1Jd6V_=KiEAYp57aO5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> + * isnulls is an array of boolean-tuples with key->partnatts booleans values
> + * each. Currently only used for list partitioning, it stores whether a
>
> I think 'booleans' should be 'boolean'.
> The trailing word 'each' is unnecessary.

> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax

> + isDuplicate = checkForDuplicates(result, values);
> + if (isDuplicate)
> + continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

I agree that isDuplicate is not required.
Thanks for sharing the comments. I will take care of these comments in
the next patch.

> + //TODO: Handle for multi-column cases
> + for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?

Yes. The code changes related to partition-wise join are in progress.
I will handle these in the next patch.

Thanks & Regards,
Nitin Jadhav

On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
>
>
> On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com> wrote:
>>
>> > The new list bound binary search and related comparison support
>> > function look a bit too verbose to me. I was expecting
>> > partition_list_bsearch() to look very much like
>> > partition_range_datum_bsearch(), but that is not the case. The
>> > special case code that you wrote in partition_list_bsearch() seems
>> > unnecessary, at least in that function. I'm talking about the code
>> > fragment starting with this comment:
>> >
>> > I will look at other parts of the patch next week hopefully. For
>> > now, attached is a delta patch that applies on top of your v1, which
>> > does:
>> >
>> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
>> > * Make qsort_partition_list_value_cmp simply call
>> > partition_lbound_datum_cmp() instead of having its own logic to
>> > compare input bounds
>> > * Move partition_lbound_datum_cmp() into partbounds.c as a static
>> > function (export seems unnecessary)
>> > * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index
>>
>> Yes. You are right. The extra code added in partition_list_bsearch()
>> is not required and thanks for sharing the delta patch. It looks good
>> to me and I have incorporated the changes in the attached patch.
>>
>> > I guess you're perhaps trying to address the case where the caller
>> > does not specify the values for all of the partition key columns,
>> > which can happen when the partition pruning code needs to handle a set
>> > of clauses matching only some of the partition key columns. But
>> > that's a concern of the partition pruning code and so the special case
>> > should be handled there (if at all), not in the binary search function
>> > that is shared with other callers. Regarding that, I'm wondering if
>> > we should require clauses matching all of the partition key columns to
>> > be found for the pruning code to call the binary search, so do
>> > something like get_matching_hash_bounds() does:
>> >
>> > Do you think that trying to match list partitions even with fewer keys
>> > is worth the complexity of the implementation? That is, is the use
>> > case to search for only a subset of partition key columns common
>> > enough with list partitioning?
>> >
>> > If we do decide to implement the special case, remember that to do
>> > that efficiently, we'd need to require that the subset of matched key
>> > columns constitutes a prefix, because of the way the datums are
>> > sorted. That is, match all partitions when the query only contains a
>> > clause for b when the partition key is (a, b, c), but engage the
>> > special case of pruning if the query contains clauses for a, or for a
>> > and b.
>>
>> Thanks for the suggestion. Below is the implementation details for the
>> partition pruning for multi column list partitioning.
>>
>> In the existing code (For single column list partitioning)
>> 1. In gen_partprune_steps_internal(), we try to match the where
>> clauses provided by the user with the partition key data using
>> match_clause_to_partition_key(). Based on the match, this function can
>> return many values like PARTCLAUSE_MATCH_CLAUSE,
>> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
>> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
>> gen_prune_steps_from_opexps() (strategy-2) which generate and return a
>> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
>> clauses that have been matched to the partition key and it also takes
>> care handling prefix of the partition keys.
>> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
>> gen_prune_step_op() (strategy-1) which generates single
>> PartitionPruneStepOp since the earlier list partitioning supports
>> single column and there can be only one NULL value. In
>> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
>> partition index which accepts null and we used to return from here.
>>
>> In case of multi column list partitioning, we have columns more than
>> one and hence there is a possibility of more than one NULL values in
>> the where clauses. The above mentioned steps are modified like below.
>>
>> 1. Modified the match_clause_to_partition_key() to generate an object
>> of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
>> case of clauses related to NULL. The information required to generate
>> PartClauseInfo is populated here like the constant expression
>> consisting of (Datum) 0, op_strategy, op_is_ne, etc.
>> 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
>> (gen_prune_steps_from_opexps) to generate partition pruning steps.
>> This function takes care of generating a list of pruning steps if
>> there are multiple clauses and also takes care of handling prefixes.
>> 3. Modified perform_pruning_base_step() to generate the datum values
>> and isnulls data of the where clauses. In case if any of the key
>> contains NULL value then the corresponding datum value is 0.
>> 4. Modified get_matching_list_bounds() to generate the minimum offset
>> and/or maximum offset of the matched values based on the difference
>> operation strategies. Now since the NULL containing bound values are
>> part of 'boundinfo', changed the code accordingly to include the NULL
>> containing partitions or not in different scenarios like
>> InvalidStrategy, etc.
>>
>> I have done some cosmetic changes to
>> v1_multi_column_list_partitioning.patch. So all the above code changes
>> related to partition pruning are merged with the previous patch and
>> also included the delta patch shared by you. Hence sharing a single
>> patch.
>>
>> Kindly have a look and share your thoughts.
>>
>>
> Hi,
>
> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax
>
> + isDuplicate = checkForDuplicates(result, values);
> + if (isDuplicate)
> + continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().
>
> + //TODO: Handle for multi-column cases
> + for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?
>
> Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2021-08-27 07:33:51 Re: Can we get rid of repeated queries from pg_dump?
Previous Message Daniel Gustafsson 2021-08-27 07:17:37 Re: Error code for checksum failure in origin.c