Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)
Date: 2015-01-06 13:51:12
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80109C84E@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim, Thanks for your reviewing the patch.

The attached patch is revised one according to your suggestion,
and also includes bug fix I could found.

* Definitions of TIDXXXXOperator was moved to pg_operator.h
as other operator doing.
* Support the case of "ctid (operator) Param" expression.
* Add checks if commutator of TID was not found.
* Qualifiers gets extracted on plan stage, not path stage.
* Adjust cost estimation logic to fit SeqScan manner.
* Add some new test cases for regression test.

> On 12/24/14, 7:54 PM, Kouhei Kaigai wrote:
> >>> On Mon, Dec 15, 2014 at 4:55 PM, Kouhei Kaigai
> >>> <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> >> wrote:
> >>>> I'm not certain whether we should have this functionality in
> >>>> contrib from the perspective of workload that can help, but its
> >>>> major worth is for an example of custom-scan interface.
> >>> worker_spi is now in src/test/modules. We may add it there as well,
> no?
> >>>
> >> Hmm, it makes sense for me. Does it also make sense to add a
> >> test-case to the core regression test cases?
> >>
> > The attached patch adds ctidscan module at test/module instead of contrib.
> > Basic portion is not changed from the previous post, but file
> > locations and test cases in regression test are changed.
>
> First, I'm glad for this. It will be VERY valuable for anyone trying to
> clean up the end of a majorly bloated table.
>
> Here's a partial review...
>
> +++ b/src/test/modules/ctidscan/ctidscan.c
>
> +/* missing declaration in pg_proc.h */
> +#ifndef TIDGreaterOperator
> +#define TIDGreaterOperator 2800
> ...
> If we're calling that out here, should we have a corresponding comment in
> pg_proc.h, in case these ever get renumbered?
>
It was a quick hack when I moved this module out of the tree.
Yep, I should revert when I submit this patch again.

> +CTidQualFromExpr(Node *expr, int varno)
> ...
> + if (IsCTIDVar(arg1, varno))
> + other = arg2;
> + else if (IsCTIDVar(arg2, varno))
> + other = arg1;
> + else
> + return NULL;
> + if (exprType(other) != TIDOID)
> + return NULL; /* should not happen */
> + /* The other argument must be a pseudoconstant */
> + if (!is_pseudo_constant_clause(other))
> + return NULL;
>
> I think this needs some additional blank lines...
>
OK. And, I also noticed the coding style around this function is
different from other built-in plans, so I redefined the role of
this function just to check whether the supplied RestrictInfo is
OpExpr that involves TID inequality operator, or not.

Expression node shall be extracted when Plan node is created
from Path node.

> + if (IsCTIDVar(arg1, varno))
> + other = arg2;
> + else if (IsCTIDVar(arg2, varno))
> + other = arg1;
> + else
> + return NULL;
> +
> + if (exprType(other) != TIDOID)
> + return NULL; /* should not happen */
> +
> + /* The other argument must be a pseudoconstant */
> + if (!is_pseudo_constant_clause(other))
> + return NULL;
>
> + * CTidEstimateCosts
> + *
> + * It estimates cost to scan the target relation according to the given
> + * restriction clauses. Its logic to scan relations are almost same as
> + * SeqScan doing, because it uses regular heap_getnext(), except for
> + * the number of tuples to be scanned if restriction clauses work well.
>
> <grammar>That should read "same as what SeqScan is doing"... however, what
> actual function are you talking about? I couldn't find SeqScanEstimateCosts
> (or anything ending EstimateCosts).
>
It is cost_seqscan(). But I don't put a raw function name in the source
code comments in other portion, because nobody can guarantee it is up to
date in the future...

> BTW, there's other grammar issues but it'd be best to handle those all at
> once after all the code stuff is done.
>
Yep. Help by native English speaker is very helpful for us.

> + opno = get_commutator(op->opno);
> What happens if there's no commutator? Perhaps best to Assert(opno !=
> InvalidOid) at the end of that if block.
>
Usually, commutator operator of TID is defined on initdb time, however,
nobody can guarantee a mad superuser doesn't drop it.
So, I added elog(ERROR,...) here.

> Though, it seems things will fall apart anyway if ctid_quals isn't exactly
> what we're expecting; I don't know if that's OK or not.
>
No worry, it was already checked on planning stage.

> + /* disk costs --- assume each tuple on a different page */
> + run_cost += spc_random_page_cost * ntuples;
> Isn't that extremely pessimistic?
>
Probably. I follow the manner of SeqScan.

> I'm not familiar enough with the custom-scan stuff to really comment past
> this point, and I could certainly be missing some details about planning
> and execution.
>
> I do have some concerns about the regression test, but perhaps I'm just
> being paranoid:
>
> - The EXPLAIN tests don't cover >. They do cover <= and >= via BETWEEN.
> - Nothing tests the case of '...'::tid op ctid; only lvalue cases are tested.
>
Both are added.

> Also, it seems that we don't handle joining on a ctid qual... is that
> intentional?
>
Yep. This module does not intend to handle joining, because custom-/
foreign-join interface is still under the discussion.
https://commitfest.postgresql.org/action/patch_view?id=1653

Hanada-san makes an enhancement of postgres_fdw on this enhancement.
https://commitfest.postgresql.org/action/patch_view?id=1674

> I know that sounds silly, but you'd probably want to do that
> if you're trying to move tuples off the end of a bloated table. You could
> work around it by constructing a dynamic SQL string, but it'd be easier
> to do something like:
>
> UPDATE table1 SET ...
> WHERE ctid >= (SELECT '(' || relpages || ',0)' FROM pg_class WHERE oid
> = 'table1'::regclass) ;
>
This example noticed me that the previous version didn't support the case
of "ctid (operator) Param".
So, I enhanced to support above case, and update the regression test also.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-v9.5-test-ctidscan.v3.patch application/octet-stream 65.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-01-06 14:17:21 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Robert Haas 2015-01-06 13:26:22 Re: [RFC] Incremental backup v3: incremental PoC