Re: libpq: Process buffered SSL read bytes to support records >8kB on async API

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Lars Kanis <lars(at)greiz-reinsdorf(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: libpq: Process buffered SSL read bytes to support records >8kB on async API
Date: 2024-09-10 18:49:38
Message-ID: CAOYmi+=jZNU9mZ4Z83aR0RiqZ2pF35u5040gCxBgcFqoA4oNbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 8, 2024 at 1:08 PM Lars Kanis <lars(at)greiz-reinsdorf(dot)de> wrote:
> I'm the maintainer of ruby-pg the ruby interface to the PostgreSQL
> database. This binding uses the asynchronous API of libpq by default to
> facilitate the ruby IO wait and scheduling mechanisms.
>
> This works well with the vanilla postgresql server, but it leads to
> starvation with other types of servers using the postgresql wire
> protocol 3. This is because the current functioning of the libpq async
> interface depends on a maximum size of SSL records of 8kB.

Thanks for the report! I wanted evidence that this wasn't a
ruby-pg-specific problem, so I set up a test case with
Python/psycopg2.

I was able to reproduce a hang when all of the following were true:
- psycopg2's async mode was enabled
- the client performs a PQconsumeInput/PQisBusy loop, waiting on
socket read events when the connection is busy (I used
psycopg2.extras.wait_select() for this)
- the server splits a large message over many large TLS records
- the server packs the final ReadyForQuery message into the same
record as the split message's final fragment

Gory details of the packet sizes, if it's helpful:
- max TLS record size is 12k, because it made the math easier
- server sends DataRow of 32006 bytes, followed by DataRow of 806
bytes, followed by CommandComplete/ReadyForQuery
- so there are three TLS records on the wire containing
1) DataRow 1 fragment 1 (12k bytes)
2) DataRow 1 fragment 2 (12k bytes)
3) DataRow 1 fragment 3 (7430 bytes) + DataRow 2 (806 bytes)
+ CommandComplete + ReadyForQuery

> To fix this issue the attached patch calls pqReadData() repeatedly in
> PQconsumeInput() until there is no buffered SSL data left to be read.
> Another solution could be to process buffered SSL read bytes in
> PQisBusy() instead of PQconsumeInput() .

I agree that PQconsumeInput() needs to ensure that the transport
buffers are all drained. But I'm not sure this is a complete solution;
doesn't GSS have the same problem? And are there any other sites that
need to make the same guarantee before returning?

I need to switch away from this for a bit. Would you mind adding this
to the next Commitfest as a placeholder?

https://commitfest.postgresql.org/50/

Thanks,
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-09-10 18:51:23 Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message Masahiko Sawada 2024-09-10 18:31:46 Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation