Re: Retrieving unused tuple attributes in ExecScan

From: "Ma, Marcus" <marcjma(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Retrieving unused tuple attributes in ExecScan
Date: 2022-06-27 19:29:34
Message-ID: 7178EBDD-342B-4C1A-B9BA-908FD17B6C48@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey Andres,

So I'm actually using the columns during merge join, basically I'm building a bloom filter on the outer relation and filtering out data on the inner relation of the join. I'm building the filter on the join keys, so the columns are being used further up the execution tree. However, even on a command like:

Select * from t1 inner join t2 on t1.c1 = t2.c2;

The execScan function returns slots that have (0, 0, 0) even though t1.c1 and t2.c2 will be used later on. I know that the Sort node and the MergeJoin node are able to read the actual values of the join keys, but for some reason the values aren't showing up on the SeqScan level. However, as soon as I add a qualification, such as:

Select * from t1 inner join on t1.c1 = t2.c2 where t1.c1 % 2 = 0;

The qualification makes the t1.c1 value show up during execScan, but not the t2.c2 value.

Marcus

On 6/27/22, 3:10 PM, "Andres Freund" <andres(at)anarazel(dot)de> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi,

On 2022-06-27 19:00:44 +0000, Ma, Marcus wrote:
> If I understand correctly, when a Sequential Scan takes place, the ExecScan function (located in executor/execScan.c) does not retrieve all attributes per tuple in the TupleTableSlot and only retrieves the necessary attribute. So for example, let’s imagine we have a table t1 with 3 number fields, c1, c2, and c3. So in the command:
>
> Select * from t1 where t1.c1 > 500;
>
> The returned TupleTableSlot will have its field of tts_values in the form (X, 0, 0), where X is the real value of t1.c1 but the fields of c2 and c3 are not actually retrieved because they aren’t used. Similarly, for the command:
>
> Select * from t1;
>
> The TupleTableSlot will always return the values of (0, 0, 0) because no
> comparisons are necessary. I am working on code where I’ll need access to
> attributes that aren’t listed in any qualification – what code should I
> change in execScan, or nodeSeqScan to be able to retrieve any attribute of a
> tuple? Basically, being able to make execScan return (X, Y, Z) instead of
> (0, 0, 0) even if the command doesn’t use any attribute comparisons.

You'll need to tell the planner that those columns are needed. It's not just
seqscans that otherwise will discard / not compute values.

Where exactly do you need those columns and why?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-06-27 19:36:09 Re: do only critical work during single-user vacuum?
Previous Message Andres Freund 2022-06-27 19:09:46 Re: Retrieving unused tuple attributes in ExecScan