From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net> |
Subject: | Re: Support for pg_receivexlog --format=plain|tar |
Date: | 2017-01-10 02:01:00 |
Message-ID: | CAB7nPqThgajSwPkQVfd3Bgu8s21=cnwBWBXBjyZLupq-EBj-=w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jan 7, 2017 at 8:19 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Sat, Jan 7, 2017 at 12:31 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
>> There is something I forgot. With this patch,
>> FindStreamingStart()@pg_receivexlog.c is actually broken. In short it
>> forgets to consider files that have been compressed at the last run of
>> pg_receivexlog and will try to stream changes from the beginning. I
>> can see that gzip -l provides this information... But I have yet to
>> find something in zlib that allows a cheap lookup as startup of
>> streaming should be fast. Looking at how gzip -l does it may be faster
>> than looking at the docs.
>
> Do we really care though? As in, does startup of streaming have to be *that*
> fast? Even gunziping 16Mb (worst case) doesn't exactly take a long time. If
> your pg_receivexlog is restarting so often that it becomes a problem, I
> think you already have another and much bigger problem on your hands.
Based on some analysis, it is enough to look at the last 4 bytes of
the compressed file to get the size output data with a single call to
lseek() and then read(). So as there is a simple way to do things and
that's far cheaper than decompressing perhaps hundred of segments I'd
rather do it this way. Attached is the implementation. This code is
using 2 booleans for 4 states of the file names: full non-compressed,
partial non-compressed, full compressed and partial compressed. This
keeps the last check of FindStreamingStart() more simple, but that's
quite fancy lately to have an enum for such things :D
> I found another problem with it -- it is completely broken in sync mode. You
> need to either forbid sync mode together with compression, or teach
> dir_sync() about it. The later would sound better, but I wonder how much
> that's going to kill compression due to the small blocks? Is it a reasonable
> use-case?
Hm. Looking at the docs I think that --compress defined with
--synchronous maps to the use of Z_SYNC_FLUSH with gzflush(). FWIW I
don't have a direct use case for it, but it is not a major effort to
add it, so done. There is no actual reason to forbid this combinations
of options either.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
receivexlog-gzip-v4.patch | text/x-patch | 12.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira | 2017-01-10 03:04:25 | Re: move collation import to backend |
Previous Message | Joel Jacobson | 2017-01-10 01:54:40 | Re: RustgreSQL |