From: | Amit Langote <amitlangote09(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Performance Improvement For Copy From Binary Files |
Date: | 2020-07-13 02:32:30 |
Message-ID: | CA+HiwqEuoAYrBGmDqFOnvOkpO2ZLBSE49VzoLKKdA+Bz-4vtcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 13, 2020 at 10:19 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > On Mon, Jul 13, 2020 at 4:06 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >
> > This error showed up when cfbot tried it:
> >
> > COPY BINARY stud_emp FROM
> > '/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/stud_emp.data';
> > +ERROR: could not read from COPY file: Bad address
>
> This is due to the recent commit
> cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the
> raw_buf and line_buf allocations for binary files. Since we are using
> raw_buf for this performance improvement feature, now, it's enough to
> restrict only line_buf for binary files. I made the changes
> accordingly in the v3 patch attached here.
>
> Regression tests(make check & make check-world) ran cleanly with the v3 patch.
Thank you Bharath. I was a bit surprised that you had also submitted
a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-)
> Please also find my responses for:
>
> Vignesh's comment:
>
> > On Sun, Jul 12, 2020 at 6:43 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > I had reviewed the changes. I felt one minor change required:
> > + * CopyReadFromRawBuf
> > + * Reads 'nbytes' bytes from cstate->copy_file via
> > cstate->raw_buf and
> > + * writes then to 'saveTo'
> > + *
> > + * Useful when reading binary data from the file.
> > Should "writes then to 'saveTo'" be "writes them to 'dest'"?
> >
>
> Thanks Vignesh for reviewing the patch. Modified 'saveTo' to 'dest' in v3 patch.
My bad.
> Amit's comment:
>
> >
> > > For the case where required nbytes may not fit into the buffer in
> > > CopyReadFromRawBuf, I'm sure this can happen only for field data,
> > > (field count , and field size are of fixed length and can fit in the
> > > buffer), instead of reading them in parts of buff size into the buffer
> > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the
> > > destination, I think we can detect this condition using requested
> > > bytes and the buffer size and directly read from the file to the
> > > destination buffer and then reload the raw_buffer for further
> > > processing. I think this way, it will be good.
> >
> > Hmm, I'm afraid that this will make the code more complex for
> > apparently small benefit. Is this really that much of a problem
> > performance wise?
> >
>
> Yes it makes CopyReadFromRawBuf(), code a bit complex from a
> readability perspective. I'm convinced not to have the
> abovementioned(by me) change, due to 3 reasons,1) the
> readability/understandability 2) how many use cases can we have where
> requested field size greater than RAW_BUF_SIZE(64KB)? I think very few
> cases. I may be wrong here. 3) Performance wise it may not be much as
> we do one extra memcpy only in situations where field sizes are
> greater than 64KB(as we have already seen and observed by you as well
> in one of the response [1]) that memcpy cost for this case may be
> negligible.
Actually, an extra memcpy is incurred on every call of
CopyReadFromRawBuf(), but I haven't seen it to be very problematic.
By the way, considering the rebase over cd22d3cdb9b, it seemed to me
that we needed to update the comments in CopyStateData struct
definition a bit more. While doing that, I realized
CopyReadFromRawBuf as a name for the new function might be misleading
as long as we are only using it for binary data. Maybe
CopyReadBinaryData is more appropriate? See attached v4 with these
and a few other cosmetic changes.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
CopyFrom-binary-buffering_v4.patch | application/octet-stream | 8.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2020-07-13 03:16:57 | Re: [PATCH] Performance Improvement For Copy From Binary Files |
Previous Message | Kyotaro Horiguchi | 2020-07-13 01:57:36 | Re: archive status ".ready" files may be created too early |