From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | 9erthalion6(at)gmail(dot)com |
Cc: | jesper(dot)pedersen(at)redhat(dot)com, david(dot)rowley(at)2ndquadrant(dot)com, florisvannee(at)optiver(dot)com, thomas(dot)munro(at)gmail(dot)com, jtc331(at)gmail(dot)com, rafia(dot)pghackers(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, pg(at)bowt(dot)ie, tomas(dot)vondra(at)2ndquadrant(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, bhushan(dot)uparkar(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, a(dot)korotkov(at)postgrespro(dot)ru |
Subject: | Re: Index Skip Scan |
Date: | 2019-07-25 11:29:50 |
Message-ID: | 20190725.202950.257050501.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Sorry, there's a too-hard-to-read part.
At Thu, 25 Jul 2019 20:17:37 +0900 (Tokyo Standard Time), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in <20190725(dot)201737(dot)192223037(dot)horikyota(dot)ntt(at)gmail(dot)com>
> Hello.
>
> At Wed, 24 Jul 2019 22:49:32 +0200, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote in <CA+q6zcXgwDMiowOGbr7gimTY3NV-LbcwP=rbma_L56pc+9p1Xw(at)mail(dot)gmail(dot)com>
> > > On Mon, Jul 22, 2019 at 7:10 PM Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
> > >
> > > On 7/22/19 1:44 AM, David Rowley wrote:
> > > > Here are the comments I noted down during the review:
> > > >
> > > > cost_index:
> > > >
> > > > I know you've not finished here, but I think it'll need to adjust
> > > > tuples_fetched somehow to account for estimate_num_groups() on the
> > > > Path's unique keys. Any Eclass with an ec_has_const = true does not
> > > > need to be part of the estimate there as there can only be at most one
> > > > value for these.
> > > >
> > > > For example, in a query such as:
> > > >
> > > > SELECT x,y FROM t WHERE x = 1 GROUP BY x,y;
> > > >
> > > > you only need to perform estimate_num_groups() on "y".
> > > >
> > > > I'm really not quite sure on what exactly will be required from
> > > > amcostestimate() here. The cost of the skip scan is not the same as
> > > > the normal scan. So other that API needs adjusted to allow the caller
> > > > to mention that we want skip scans estimated, or there needs to be
> > > > another callback.
> > > >
> > >
> > > I think this part will become more clear once the index skip scan patch
> > > is rebased, as we got the uniquekeys field on the Path, and the
> > > indexskipprefixy info on the IndexPath node.
> >
> > Here is what I came up with to address the problems, mentioned above in this
> > thread. It passes tests, but I haven't tested it yet more thoughtful (e.g. it
> > occurred to me, that `_bt_read_closest` probably wouldn't work, if a next key,
> > that passes an index condition is few pages away - I'll try to tackle it soon).
> > Just another small step forward, but I hope it's enough to rebase on top of it
> > planner changes.
> >
> > Also I've added few tags, mostly to mention reviewers contribution.
>
> I have some comments.
>
> + * The order of columns in the index should be the same, as for
> + * unique distincs pathkeys, otherwise we cannot use _bt_search
> + * in the skip implementation - this can lead to a missing
> + * records.
>
> It seems that it is enough that distinct pathkeys is contained in
> index pathkeys. If it's right, that is almost checked in existing
> code:
>
> > if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
>
> It is perfect when needed_pathkeys is distinct_pathkeys. So
> additional check is required if and only if it is not the case.
So I think the following change will work.
- + /* Consider index skip scan as well */
- + if (enable_indexskipscan &&
- + IsA(path, IndexPath) &&
- + ((IndexPath *) path)->indexinfo->amcanskip &&
- + (path->pathtype == T_IndexOnlyScan ||
- + path->pathtype == T_IndexScan) &&
- + root->distinct_pathkeys != NIL)
+ + if (enable_indexskipscan &&
+ + IsA(path, IndexPath) &&
+ + ((IndexPath *) path)->indexskipprefix >= 0 &&
+ + (needed_pathkeys == root->distinct_pathkeys ||
+ + pathkeys_contained_in(root->distinct_pathkeys,
+ + path->pathkeys)))
Additional comments on the condition above are:
> path->pathtype is always one of T_IndexOnlyScan or T_IndexScan so
> the check against them isn't needed. If you have concern on that,
> we can add that as Assert().
>
> I feel uncomfortable to look into indexinfo there. Couldnd't we
> use indexskipprefix == -1 to signal !amcanskip from
> create_index_path?
>
>
> + /*
> + * XXX: In case of index scan quals evaluation happens after
> + * ExecScanFetch, which means skip results could be fitered out
> + */
>
> Why can't we use skipscan path if having filter condition? If
> something bad happens, the reason must be written here instead of
> what we do.
>
>
> + if (path->pathtype == T_IndexScan &&
> + parse->jointree != NULL &&
> + parse->jointree->quals != NULL &&
> + ((List *)parse->jointree->quals)->length != 0)
>
> It's better to use list_length instead of peeping inside. It
> handles the NULL case as well. (The structure has recently
> changed but .length is not, though.)
>
>
> + * If advancing direction is different from index direction, we must
> + * skip right away, but _bt_skip requires a starting point.
>
> It doesn't seem needed to me. Could you elaborate on the reason
> for that?
>
>
> + * If advancing direction is different from index direction, we must
> + * skip right away, but _bt_skip requires a starting point.
> + */
> + if (direction * indexonlyscan->indexorderdir < 0 &&
> + !node->ioss_FirstTupleEmitted)
>
> I'm confused by this. "direction" there is the physical scan
> direction (fwd/bwd) of index scan, which is already compensated
> by indexorderdir. Thus the condition means we do that when
> logical ordering (ASC/DESC) is DESC. (Though I'm not sure what
> "index direction" exactly means...)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2019-07-25 11:50:24 | Re: Initdb failure |
Previous Message | vignesh C | 2019-07-25 11:26:09 | Re: POC: Cleaning up orphaned files using undo logs |