Re: [POC] FETCH limited by bytes.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: Re: [POC] FETCH limited by bytes.
Date: 2015-09-21 04:52:09
Message-ID: CADkLM=fwPC+r8p_LAe12OkgsUbyAQtq77SZ2vEv7LLMQ_mcccg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 4, 2015 at 2:45 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hello,
>
> > > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > > int eflags)
> ...
> > > > + def = get_option(table->options, "fetch_size");
> >
> > > I don't think it's a good idea to make such checks at runtime - and
> > > either way it's somethign that should be reported back using an
> > > ereport(), not an elog.
> >
> > > Also, it seems somewhat wrong to determine this at execution
> > > time. Shouldn't this rather be done when creating the foreign scan
> node?
> > > And be a part of the scan state?
> >
> > I agree, that was my original plan, but I wasn't familiar enough with the
> > FDW architecture to know where the table struct and the scan struct were
> > both exposed in the same function.
> >
> > What I submitted incorporated some of Kyotaro's feedback (and working
> > patch) to my original incomplete patch.
>
> Sorry, it certainly shouldn't be a good place to do such thing. I
> easily selected the place in order to avoid adding new similar
> member in multiple existing structs (PgFdwRelationInfo and
> PgFdwScanState).
>
> Having a new member fetch_size is added in PgFdwRelationInfo and
> PgFdwScanState, I think postgresGetForeignRelSize is the best
> place to do that, from the point that it collects basic
> information needed to calculate scan costs.
>
> # Fetch sizes of foreign join would be the future issue..
>
> > typedef struct PgFdwRelationInfo
> > {
> ...
> + int fetch_size; /* fetch size for this remote table */
>
> ====================
> > postgreGetForeignRelSize()
> > {
> ...
> > fpinfo->table = GetForeignTable(foreigntableid);
> > fpinfo->server = GetForeignServer(fpinfo->table->serverid);
> >
> + def = get_option(table->options, "fetch_size");
> + ..
> + fpinfo->fetch_size = strtod(defGetString...
>
> Also it is doable in postgresGetForeignPlan and doing there
> removes redundant copy of fetch_size from PgFdwRelation to
> PgFdwScanState but theoretical basis would be weak.
>
> regards,
>
> > > On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
> > > > +static DefElem*
> > > > +get_option(List *options, char *optname)
> > > > +{
> > > > + ListCell *lc;
> > > > +
> > > > + foreach(lc, options)
> > > > + {
> > > > + DefElem *def = (DefElem *) lfirst(lc);
> > > > +
> > > > + if (strcmp(def->defname, optname) == 0)
> > > > + return def;
> > > > + }
> > > > + return NULL;
> > > > +}
> > >
> > >
> > > > /*
> > > > * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state
> stays
> > > NULL.
> > > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > > int eflags)
> > > > server = GetForeignServer(table->serverid);
> > > > user = GetUserMapping(userid, server->serverid);
> > > >
> > > > + /* Reading table options */
> > > > + fsstate->fetch_size = -1;
> > > > +
> > > > + def = get_option(table->options, "fetch_size");
> > > > + if (!def)
> > > > + def = get_option(server->options, "fetch_size");
> > > > +
> > > > + if (def)
> > > > + {
> > > > + fsstate->fetch_size = strtod(defGetString(def), NULL);
> > > > + if (fsstate->fetch_size < 0)
> > > > + elog(ERROR, "invalid fetch size for foreign
> table
> > > \"%s\"",
> > > > + get_rel_name(table->relid));
> > > > + }
> > > > + else
> > > > + fsstate->fetch_size = 100;
> > >
> > > I don't think it's a good idea to make such checks at runtime - and
> > > either way it's somethign that should be reported back using an
> > > ereport(), not an elog.
> >
> > > Also, it seems somewhat wrong to determine this at execution
> > > time. Shouldn't this rather be done when creating the foreign scan
> node?
> > > And be a part of the scan state?
> >
> > I agree, that was my original plan, but I wasn't familiar enough with the
> > FDW architecture to know where the table struct and the scan struct were
> > both exposed in the same function.
> >
> > What I submitted incorporated some of Kyotaro's feedback (and working
> > patch) to my original incomplete patch.
> >
> > > Have you thought about how this option should cooperate with join
> > > pushdown once implemented?
> > >
> >
> > I hadn't until now, but I think the only sensible thing would be to
> > disregard table-specific settings once a second foreign table is
> detected,
> > and instead consider only the server-level setting.
> >
> > I suppose one could argue that if ALL the tables in the join had the same
> > table-level setting, we should go with that, but I think that would be
> > complicated, expensive, and generally a good argument for changing the
> > server setting instead.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
Ok, with some guidance from RhodiumToad (thanks!) I was able to get the
proper RelOptInfo->Plan->Scan handoff.

What I *don't* know how to do is show that the proper fetch sizes are being
used on the remote server with the resources available in the regression
test. *Suggestions welcome.*

This patch works for my original added test-cases, and works for me
connecting to a redshift cluster that we have, the queries show up in the
console like this:
FETCH 101 FROM c1
FETCH 30 FROM c1
FETCH 50 FROM c1

The (redacted) source of that test is as follows:

begin;
create extension if not exists postgres_fdw;

create server redshift foreign data wrapper postgres_fdw
options (host 'REDACTED', port '5439', dbname 'REDACTED', fetch_size '101');

select * from pg_foreign_server;

create user mapping for public server redshift
options ( user 'REDACTED', password 'REDACTED');

select * from pg_user_mappings;

create foreign table test_table ( date date, tval text )
server redshift
options (table_name 'REDACTED');

select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;

alter server redshift options ( set fetch_size '30' );

select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;

alter foreign table test_table options ( fetch_size '50' );

select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;

rollback;

Attached is the patch / diff against current master.

Attachment Content-Type Size
0002-Make-fetch_size-settable-per-foreign-server-and-fore.patch text/x-patch 13.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-09-21 05:33:14 Re: row_security GUC, BYPASSRLS
Previous Message Tom Lane 2015-09-21 02:21:16 Re: row_security GUC, BYPASSRLS