Re: refactoring basebackup.c

From: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: refactoring basebackup.c
Date: 2021-10-20 12:48:55
Message-ID: CAOgcT0OBD+GU4ZtY3wzzQng8ffjTP_WhXGnfUzrSN4vxDnzxmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

I mean #define CHUNK_SIZE is still in the patch.
>

Oops, removed now.

> > > With just that change, you can set
> > > next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> > > but that's also more than you need. You can instead do next_buf_len =
> > > Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> > > probably thinking that won't work, because bbsink_lz4_begin_archive()
> > > could fill up the buffer partway, and then the first call to
> > > bbsink_lz4_archive_contents() could overrun it. But that problem can
> > > be solved by reversing the order of operations in
> > > bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> > > test whether you need to empty the buffer first, and if so, do it.
> >
> > I am still not able to get - how can we survive with a mere
> > size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
> > LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
> > proposal, it is almost guaranteed that the next buffer length will
> > be always set to 19, which will result in failure of a call to
> > LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
> > even if we had called bbsink_archive_contents() before.
>
> Sorry, should have been Max(), not Min().
>

Ahh ok.
I looked into the code of LZ4F_compressBound() and the header size is
already included in the calculations, so when we compare
LZ4F_HEADER_SIZE_MAX and mysink->output_buffer_bound, the latter
will be always greater, and hence sufficient length for the output buffer. I
made this change in the attached patch, and also cleared the buffer by
calling bbsink_archive_contents() before calling LZ4_compressUpdate().
Similarly cleared the buffer before calling LZ4F_compressEnd().

> You mean the way gzip allows us to use our own alloc and free functions
> > by means of providing the function pointers for them. Unfortunately,
> > no, LZ4 does not have that kind of provision. Maybe that makes a
> > good proposal for LZ4 library ;-).
> > I cannot think of another solution to it right away.
>
> OK. Will give it some thought.

I have started a thread[1] on LZ4 community for this, but so far no
reply on that.

Regards,
Jeevan Ladhe

[1]
https://groups.google.com/g/lz4c/c/WnJkKwBWlcM/m/zszrla2mBQAJ?utm_medium=email&utm_source=footer

Attachment Content-Type Size
lz4_compress_v6.patch application/octet-stream 10.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2021-10-20 12:58:26 Re: pg_receivewal starting position
Previous Message Andrew Dunstan 2021-10-20 12:40:04 Re: Postgres perl module namespace