Re: [POC] FETCH limited by bytes.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: Re: [POC] FETCH limited by bytes.
Date: 2015-02-27 18:50:22
Message-ID: CADkLM=fOjU7raFUJKo8NhrXZg=m=deA7aat3UX=eZNKxo-oMCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a diff containing the original (working) patch from my
(incomplete) patch, plus regression test changes and documentation changes.

While it's easy to regression-test the persistence of the fetch_size
options, I am confounded as to how we would show that the fetch_size
setting was respected. I've seen it with my own eyes viewing the query log
on redshift, but I see no way to automate that. Suggestions welcome.

On Fri, Feb 6, 2015 at 3:11 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hello,
>
> > Redshift has a table, stv_inflight, which serves about the same purpose
> as
> > pg_stat_activity. Redshift seems to perform better with very high fetch
> > sizes (10,000 is a good start), so the default foreign data wrapper
> didn't
> > perform so well.
>
> I agree with you.
>
> > I was able to confirm that the first query showed "FETCH 150 FROM c1" as
> > the query, which is normal highly unhelpful, but in this case it confirms
> > that tables created in redshift_server do by default use the fetch_size
> > option given during server creation.
> >
> > I was also able to confirm that the second query showed "FETCH 151 FROM
> c1"
> > as the query, which shows that per-table overrides also work.
> >
> > I'd be happy to stop here, but Kyotaro might feel differently.
>
> This is enough in its own way, of course.
>
> > With this
> > limited patch, one could estimate the number of rows that would fit into
> > the desired memory block based on the row width and set fetch_size
> > accordingly.
>
> The users including me will be happy with it when the users know
> how to determin the fetch size. Especially the remote tables are
> very few or the configuration will be enough stable.
>
> On widely distributed systems, it would be far difficult to tune
> fetch size manually for every foreign tables, so finally it would
> be left at the default and safe size, it's 100 or so.
>
> This is the similar discussion about max_wal_size on another
> thread. Calculating fetch size is far tougher for users than
> setting expected memory usage, I think.
>
> > But we could go further, and have a fetch_max_memory option only at the
> > table level, and the fdw could do that same memory / estimated_row_size
> > calculation and derive fetch_size that way *at table creation time*, not
> > query time.
>
> We cannot know the real length of the text type data in advance,
> other than that, even defined as char(n), the n is the
> theoretically(or in-design) maximum size for the field but in the
> most cases the mean length of the real data would be far small
> than that. For that reason, calculating the ratio at the table
> creation time seems to be difficult.
>
> However, I agree to the Tom's suggestion that the changes in
> FETCH statement is defenitly ugly, especially the "overhead"
> argument is prohibitive even for me:(
>
> > Thanks to Kyotaro and Bruce Momjian for their help.
>
> Not at all.
>
> regardes,
>
>
> At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=
> P6MXMg7S86Cdqw(at)mail(dot)gmail(dot)com>
> > I applied this patch to REL9_4_STABLE, and I was able to connect to a
> > foreign database (redshift, actually).
> >
> > the basic outline of the test is below, names changed to protect my
> > employment.
> >
> > create extension if not exists postgres_fdw;
> >
> > create server redshift_server foreign data wrapper postgres_fdw
> > options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
> > fetch_size '150' );
> >
> > create user mapping for public server redshift_server options ( user
> > 'redacted_user', password 'comeonyouarekiddingright' );
> >
> > create foreign table redshift_tab150 ( <colspecs> )
> > server redshift_server options (table_name 'redacted_table', schema_name
> > 'redacted_schema' );
> >
> > create foreign table redshift_tab151 ( <colspecs> )
> > server redshift_server options (table_name 'redacted_table', schema_name
> > 'redacted_schema', fetch_size '151' );
> >
> > -- i don't expect the fdw to push the aggregate, this is just a test to
> see
> > what query shows up in stv_inflight.
> > select count(*) from redshift_ccp150;
> >
> > -- i don't expect the fdw to push the aggregate, this is just a test to
> see
> > what query shows up in stv_inflight.
> > select count(*) from redshift_ccp151;
> >
> >
> > For those of you that aren't familiar with Redshift, it's a columnar
> > database that seems to be a fork of postgres 8.cough. You can connect to
> it
> > with modern libpq programs (psql, psycopg2, etc).
> > Redshift has a table, stv_inflight, which serves about the same purpose
> as
> > pg_stat_activity. Redshift seems to perform better with very high fetch
> > sizes (10,000 is a good start), so the default foreign data wrapper
> didn't
> > perform so well.
> >
> > I was able to confirm that the first query showed "FETCH 150 FROM c1" as
> > the query, which is normal highly unhelpful, but in this case it confirms
> > that tables created in redshift_server do by default use the fetch_size
> > option given during server creation.
> >
> > I was also able to confirm that the second query showed "FETCH 151 FROM
> c1"
> > as the query, which shows that per-table overrides also work.
> >
> > I'd be happy to stop here, but Kyotaro might feel differently. With this
> > limited patch, one could estimate the number of rows that would fit into
> > the desired memory block based on the row width and set fetch_size
> > accordingly.
> >
> > But we could go further, and have a fetch_max_memory option only at the
> > table level, and the fdw could do that same memory / estimated_row_size
> > calculation and derive fetch_size that way *at table creation time*, not
> > query time.
> >
> > Thanks to Kyotaro and Bruce Momjian for their help.
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > > Hmm, somehow I removed some recipients, especially the
> > > list. Sorry for the duplicate.
> > >
> > > -----
> > > Sorry, I've been back. Thank you for the comment.
> > >
> > > > Do you have any insight into where I would pass the custom row
> fetches
> > > from
> > > > the table struct to the scan struct?
> > >
> > > Yeah it's one simple way to tune it, if the user knows the
> > > appropreate value.
> > >
> > > > Last year I was working on a patch to postgres_fdw where the
> fetch_size
> > > > could be set at the table level and the server level.
> > > >
> > > > I was able to get the settings parsed and they would show up in
> > > > pg_foreign_table
> > > > and pg_foreign_servers. Unfortunately, I'm not very familiar with how
> > > > foreign data wrappers work, so I wasn't able to figure out how to get
> > > these
> > > > custom values passed from the PgFdwRelationInfo struct into the
> > > > query's PgFdwScanState
> > > > struct.
> > >
> > > Directly answering, the states needed to be shared among several
> > > stages are holded within fdw_private. Your new variable
> > > fpinfo->fetch_size can be read in postgresGetForeignPlan. It
> > > newly creates another fdw_private. You can pass your values to
> > > ForeignPlan making it hold the value there. Finally, you will get
> > > it in postgresBeginForeginScan and can set it into
> > > PgFdwScanState.
> > >
> > > > I bring this up only because it might be a simpler solution, in that
> the
> > > > table designer could set the fetch size very high for narrow tables,
> and
> > > > lower or default for wider tables. It's also a very clean syntax,
> just
> > > > another option on the table and/or server creation.
> > > >
> > > > My incomplete patch is attached.
> > >
> > > However, the fetch_size is not needed by planner (so far), so we
> > > can simply read the options in postgresBeginForeignScan() and set
> > > into PgFdwScanState. This runs once per exection.
> > >
> > > Finally, the attached patch will work as you intended.
> > >
> > > What do you think about this?
> > >
> > > regards,
> > >
> > > --
> > > Kyotaro Horiguchi
> > > NTT Open Source Software Center
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

Attachment Content-Type Size
postgres_fdw_fetch_size_20150227.diff text/plain 8.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-02-27 18:50:42 Re: logical column ordering
Previous Message Robert Haas 2015-02-27 18:50:03 Re: plpgsql versus domains