Quick Links

Re: Extracting only the columns needed for a query

From:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: Extracting only the columns needed for a query
Date:	2020-03-13 19:10:48
Message-ID:	20200313191048.dfkelhzyjq7osqyl@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On Tue, Feb 18, 2020 at 03:26:16PM -0800, Melanie Plageman wrote:
>
> > > I believe it would be beneficial to add this potential API extension
> > patch into
> > > the thread (as an example of an interface defining how scanCols could be
> > used)
> > > and review them together.
> >
> > As for including some code that uses the scanCols, after discussing
> > off-list with a few folks, there are three options I would like to
> > pursue for doing this.
> >
> > One option I will pursue is using the scanCols to inform the columns
> > needed to be spilled for memory-bounded hashagg (mentioned by Jeff
> > here [1]).
> >
> >
> > The third is exercising it with a test only but providing an example
> > of how a table AM API user like Zedstore uses the columns during
> > planning.
> >
>
> Basically, scanCols are simply columns that need to be scanned. It is
> probably okay if it is only used by table access method API users, as
> Pengzhou's patch illustrates.

Thanks for update. Sure, that would be fine. At the moment I have couple
of intermediate commentaries.

In general implemented functionality looks good. I've checked how it
works on the existing tests, almost everywhere required columns were not
missing in scanCols (which is probably the most important part).
Sometimes exressions were checked multiple times, which could
potentially introduce some overhead, but I believe this effect is
negligible. Just to mention some counterintuitive examples, for this
kind of query

SELECT min(q1) FROM INT8_TBL;

the same column was checked 5 times in my tests, since it's present also
in baserestrictinfo, and build_minmax_path does one extra round of
planning and invoking make_one_rel. I've also noticed that for
partitioned tables every partition is evaluated separately. IIRC they
structure cannot differ, does it makes sense then? Another interesting
example is Values Scan (e.g. in an insert statements with multiple
records), can an abstract table AM user leverage information about
columns in it?

One case, where I believe columns were missing, is statements with
returning:

INSERT INTO foo (col1)
VALUES ('col1'), ('col2'), ('col3')
RETURNING *;

Looks like in this situation there is only expression in reltarget is
for col1, but returning result contains all columns.

And just out of curiosity, what do you think about table AM specific
columns e.g. ctid, xmin/xmax etc? I mean, they're not included into
scanCols and should not be since they're heap AM related. But is there a
chance that there would be some AM specific columns relevant to e.g.
the columnar storage that would also make sense to put into scanCols?

In response to

Re: Extracting only the columns needed for a query at 2020-02-18 23:26:16 from Melanie Plageman

Responses

Re: Extracting only the columns needed for a query at 2020-06-19 00:46:09 from Melanie Plageman
Re: Extracting only the columns needed for a query at 2020-06-23 21:37:09 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2020-03-13 19:27:52	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message	Alvaro Herrera	2020-03-13 19:09:31	Re: [PATCH] Incremental sort (was: PoC: Partial sort)