From: | gkokolatos(at)pm(dot)me |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Rachel Heaton <rachelmheaton(at)gmail(dot)com> |
Subject: | Re: Add LZ4 compression in pg_dump |
Date: | 2023-03-17 15:43:58 |
Message-ID: | XPMIIorJ5munrlLRZGquartiN5tuapdjiBhaPZ6iEmJccRiKWCIHSowSLPhaGj4gLpqvaS1pQn7PGkA_WNwvuUTVdl-lrU_TzHA81LqujJ4=@pm.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
------- Original Message -------
On Thursday, March 16th, 2023 at 10:20 PM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
>
> On 3/16/23 18:04, gkokolatos(at)pm(dot)me wrote:
>
> > ------- Original Message -------
> > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra tomas(dot)vondra(at)enterprisedb(dot)com wrote:
> >
> > > On 3/14/23 16:18, gkokolatos(at)pm(dot)me wrote:
> > >
> > > > ...> Would you mind me trying to come with a patch to address your points?
> > >
> > > That'd be great, thanks. Please keep it split into smaller patches - two
> > > might work, with one patch for "cosmetic" changes and the other tweaking
> > > the API error-handling stuff.
> >
> > Please find attached a set for it. I will admit that the splitting in the
> > series might not be ideal and what you requested. It is split on what
> > seemed as a logical units. Please advice how a better split can look like.
> >
> > 0001 is unifying types and return values on the API
> > 0002 is addressing the constant definitions
> > 0003 is your previous 0004 adding comments
>
>
> Thanks. I think the split seems reasonable - the goal was to not mix
> different changes, and from that POV it works.
>
> I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I
> mean, gzread/gzwrite returns int, so how does renaming the size_t
> variable solve the issue of negative values for errors? I mean, this
>
> - size_t ret;
> + size_t gzret;
>
> - ret = gzread(gzfp, ptr, size);
> + gzret = gzread(gzfp, ptr, size);
>
> means we still lost the information gzread() returned a negative value,
> no? We'll still probably trigger an error, but it's a bit weird.
You are obviously correct. My bad, I miss-read the return type of gzread().
Please find an amended version attached.
> Unless I'm missing something, if gzread() ever returns -1 or some other
> negative error value, we'll cast it to size_t, while condition will
> evaluate to "true" and we'll happily chew on some random chunk of data.
>
> So the confusion is (at least partially) a preexisting issue ...
>
> For gzwrite() it seems to be fine, because that only returns 0 on error.
> OTOH it's defined to take 'int size' but then we happily pass size_t
> values to it.
>
> As I wrote earlier, this apparently assumes we never need to deal with
> buffers larger than int, and I don't think we have the ambition to relax
> that (I'm not sure it's even needed / possible).
Agreed.
> I see the read/write functions are now defined as int, but we only ever
> return 0/1 from them, and then interpret that as bool. Why not to define
> it like that? I don't think we need to adhere to the custom that
> everything returns "int". This is an internal API. Or if we want to
> stick to int, I'd define meaningful "nice" constants for 0/1.
The return types are now booleans and the callers have been made aware.
> 0002 seems fine to me. I see you've ditched the idea of having two
> separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine
> with me, although I wonder if this might have negative impact on
> performance or something (but I doubt that).
>
I doubt that too. Thank you.
> 0003 seems fine too.
Thank you.
> > As far as the error handling is concerned, you had said upthread:
> >
> > > I think the right approach is to handle all library errors and not just
> > > let them through. So Gzip_write() needs to check the return value, and
> > > either call pg_fatal() or translate it to an error defined by the API.
> >
> > While working on it, I thought it would be clearer and more consistent
> > for the pg_fatal() to be called by the caller of the individual functions.
> > Each individual function can keep track of the specifics of the error
> > internally. Then the caller upon detecting that there was an error by
> > checking the return value, can call pg_fatal() with a uniform error
> > message and then add the specifics by calling the get_error_func().
>
>
> I agree it's cleaner the way you did it.
>
> I was thinking that with each compression function handling error
> internally, the callers would not need to do that. But I haven't
> realized there's logic to detect ENOSPC and so on, and we'd need to
> duplicate that in every compression func.
>
If you agree, I can prepare a patch to improve on the error handling
aspect of the API as a separate thread, since here we are trying to
focus on correctness.
Cheers,
//Georgios
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
v3-0003-Improve-compress_lz4-documentation.patch | text/x-patch | 3.0 KB |
v3-0002-Clean-up-constants-in-pg_dump-s-compression-API.patch | text/x-patch | 6.2 KB |
v3-0001-Improve-type-handling-in-pg_dump-s-compress-file-.patch | text/x-patch | 24.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Jungwirth | 2023-03-17 16:03:09 | Re: Exclusion constraints on partitioned tables |
Previous Message | Zheng Li | 2023-03-17 15:40:56 | Re: Support logical replication of DDLs |