Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robbie Harwood <rharwood(at)redhat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2018-06-22 16:32:54
Message-ID: 852c6ee2-5971-dc56-923b-f8eeea6b8961@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.06.2018 18:59, Robbie Harwood wrote:
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>
>> On 21.06.2018 20:14, Robbie Harwood wrote:
>>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>>> On 21.06.2018 17:56, Robbie Harwood wrote:
>>>>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>>>>> On 20.06.2018 23:34, Robbie Harwood wrote:
>>>>>>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>>>>>>
>>>>>>> Well, that's a design decision you've made. You could put
>>>>>>> lengths on chunks that are sent out - then you'd know exactly how
>>>>>>> much is needed. (For instance, 4 bytes of network-order length
>>>>>>> followed by a complete payload.) Then you'd absolutely know
>>>>>>> whether you have enough to decompress or not.
>>>>>> Do you really suggest to send extra header for each chunk of data?
>>>>>> Please notice that chunk can be as small as one message: dozen of
>>>>>> bytes because libpq is used for client-server communication with
>>>>>> request-reply pattern.
>>>>> I want you to think critically about your design. I *really* don't
>>>>> want to design it for you - I have enough stuff to be doing. But
>>>>> again, the design I gave you doesn't necessarily need that - you
>>>>> just need to properly buffer incomplete data.
>>>> Right now secure_read may return any number of available bytes. But
>>>> in case of using streaming compression, it can happen that available
>>>> number of bytes is not enough to perform decompression. This is why
>>>> we may need to try to fetch additional portion of data. This is how
>>>> zpq_stream is working now.
>>> No, you need to buffer and wait until you're called again. Which is
>>> to say: decompress() shouldn't call secure_read(). secure_read()
>>> should call decompress().
>> I this case I will have to implement this code twice: both for backend
>> and frontend, i.e. for secure_read/secure_write and
>> pqsecure_read/pqsecure_write.
> Likely, yes. You can see that this is how TLS does it (which you should
> be using as a model, architecture-wise).
>
>> Frankly speaking i was very upset by design of libpq communication
>> layer in Postgres: there are two different implementations of almost
>> the same stuff for cbackend and frontend.
> Changing the codebases so that more could be shared is not necessarily a
> bad idea; however, it is a separate change from compression.
>
>>>> I do not understand how it is possible to implement in different way
>>>> and what is wrong with current implementation.
>>> The most succinct thing I can say is: absolutely don't modify
>>> pq_recvbuf(). I gave you pseudocode for how to do that. All of your
>>> logic should be *below* the secure_read/secure_write functions.
>>>
>>> I cannot approve code that modifies pq_recvbuf() in the manner you
>>> currently do.
>> Well. I understand you arguments. But please also consider my
>> argument above (about avoiding code duplication).
>>
>> In any case, secure_read function is called only from pq_recvbuf() as
>> well as pqsecure_read is called only from pqReadData. So I do not
>> think principle difference in handling compression in secure_read or
>> pq_recvbuf functions and do not understand why it is "destroying the
>> existing model".
>>
>> Frankly speaking, I will really like to destroy existed model, moving
>> all system dependent stuff in Postgres to SAL and avoid this awful mix
>> of code sharing and duplication between backend and frontend. But it
>> is a another story and I do not want to discuss it here.
> I understand you want to avoid code duplication. I will absolutely
> agree that the current setup makes it difficult to share code between
> postmaster and libpq clients. But the way I see it, you have two
> choices:
>
> 1. Modify the code to make code sharing easier. Once this has been
> done, *then* build a compression patch on top, with the nice new
> architecture.
>
> 2. Leave the architecture as-is and add compression support.
> (Optionally, you can make code sharing easier at a later point.)
>
> Fundamentally, I think you value code sharing more than I do. So while
> I might advocate for (2), you might personally prefer (1).
>
> But right now you're not doing either of those.
>
>> If we are speaking about the "right design", then neither your
>> suggestion, neither my implementation are good and I do not see
>> principle differences between them.
>>
>> The right approach is using "decorator pattern": this is how streams
>> are designed in .Net/Java. You can construct pipe of "secure",
>> "compressed" and whatever else streams.
> I *strongly* disagree, but I don't think you're seriously suggesting
> this.
>
>>>>>>> My idea was the following: client want to use compression. But
>>>>>>> server may reject this attempt (for any reasons: it doesn't support
>>>>>>> it, has no proper compression library, do not want to spend CPU for
>>>>>>> decompression,...) Right now compression algorithm is
>>>>>>> hardcoded. But in future client and server may negotiate to choose
>>>>>>> proper compression protocol. This is why I prefer to perform
>>>>>>> negotiation between client and server to enable compression. Well,
>>>>>>> for negotiation you could put the name of the algorithm you want in
>>>>>>> the startup. It doesn't have to be a boolean for compression, and
>>>>>>> then you don't need an additional round-trip.
>>>>>> Sorry, I can only repeat arguments I already mentioned:
>>>>>> - in future it may be possible to specify compression algorithm
>>>>>> - even with boolean compression option server may have some reasons to
>>>>>> reject client's request to use compression
>>>>>>
>>>>>> Extra flexibility is always good thing if it doesn't cost too
>>>>>> much. And extra round of negotiation in case of enabling compression
>>>>>> seems to me not to be a high price for it.
>>>>> You already have this flexibility even without negotiation. I don't
>>>>> want you to lose your flexibility. Protocol looks like this:
>>>>>
>>>>> - Client sends connection option "compression" with list of
>>>>> algorithms it wants to use (comma-separated, or something).
>>>>>
>>>>> - First packet that the server can compress one of those algorithms
>>>>> (or none, if it doesn't want to turn on compression).
>>>>>
>>>>> No additional round-trips needed.
>>>> This is exactly how it works now... Client includes compression
>>>> option in connection string and server replies with special message
>>>> ('Z') if it accepts request to compress traffic between this client
>>>> and server.
>>> No, it's not. You don't need this message. If the server receives a
>>> compression request, it should just turn compression on (or not), and
>>> then have the client figure out whether it got compression back.
>> How it will managed to do it. It receives some reply and first of all
>> it should know whether it has to be decompressed or not.
> You can tell whether a message is compressed by looking at it. The way
> the protocol works, every message has a type associated with it: a
> single byte, like 'R', that says what kind of message it is.

Compressed message can contain any sequence of bytes, including 'R':)

>
> Thanks,
> --Robbie

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-06-22 16:35:13 Re: libpq compression
Previous Message Matheus de Oliveira 2018-06-22 16:25:05 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid