Re: Slowness of extended protocol

From: Shay Rojansky <roji(at)roji(dot)org>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Slowness of extended protocol
Date: 2016-08-09 08:03:43
Message-ID: CADT4RqDaim7Jx-Ob+qf8n6eoTUER9xD8v3K9PCJOht7ahobUTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 8, 2016 at 6:44 PM, Vladimir Sitnikov <
sitnikov(dot)vladimir(at)gmail(dot)com> wrote:
>
> The problem with "empty statement name" is statements with empty name can
>> be reused (for instance, for batch insert executions), so the server side
>> has to do a defensive copy (it cannot predict how many times this unnamed
>> statement will be used).
>>
>
That seems right.

Also, part of the point here is to reduce the number of protocol messages
>> needed in order to send a parameterized query - not to have to do
>> Parse/Describe/Bind/Execute/Sync - since part of the slowdown comes from
>> that (although I'm not sure how much). Your proposal keeps the 5 messages.
>>
>
> As my benchmarks show, notable overhead is due to "defensive copying of
> the execution plan". So I would measure first, and only then would claim
> where the overhead is.
>

> Some more profiling is required to tell which part is a main time consumer.
>

Tom also pointed to the caching as the main culprit, although there seems
to be some message-related overhead as well. It seems that things like
process title changes may be fixable easily - do we really need a process
title change on every message (as opposed to, say, execute messages only).
This profiling and optimization effort can happen in parallel to the
discussion on what to do with the execution plan caching.

> Technically speaking, I would prefer to have a more real-life looking
> example instead of SELECT 1.
> Do you have something in mind?
> For instance, for more complex queries, "Parse/Plan" could cost much more
> than we shave by adding "a special non-cached statement name" or by
> reducing "5 messages into 1".
>
> There's a side problem: describe message requires full roundtrip since
> there are cases when client needs to know how to encode parameters.
> Additional roundtrip hurts much worse than just an additional message that
> is pipelined (e.g. sent in the same TCP packet).
>

This is true, but there doesn't seem to be anything we can do about it - if
your usage relies on describe to get information on parameters (as opposed
to results), you're stuck with an extra roundtrip no matter what. So it
seems you have to use the extended protocol anyway.

FYI in Npgsql specifically describe isn't used to get any knowledge about
parameters - users must populate the correct parameters or query execution
fails.

> Note: it is quite easy to invent a name that is not yet used in the wild,
>>> so it is safe.
>>>
>>
>> That's problematic, how do you know what's being used in the wild and
>> what isn't? The protocol has a specification, it's very problematic to get
>> up one day and to change it retroactively. But again, the empty statement
>> seems to already be there for that.
>>
>
> Empty statement has different semantics, and it is wildly used.
> For instance, pgjdbc uses unnamed statements a lot.
> On the other hand, statement name of "!pq(at)#!@#42" is rather safe to use
> as a special case.
> Note: statement names are not typically created by humans (statement name
> is not a SQL), and very little PG clients do support named statements.
>

IMHO this simply isn't the kind of thing one does in a serious protocol of
a widely-used product, others seem to agree on this.

> Sir, any new SQL keyword is what you call a "retroactively defining
special semantics".
> It's obvious that very little current clients do use named
server-prepared statements.
> Statement names are not something that is provided by the end-user in a
web page, so it is not a rocket science to come up with a
> statement name that is both short and "never ever used in the wild".

The difference is that before the new SQL keyword is added, trying to use
it results in an error. What you're proposing is taking something that
already works in one way and changing its behavior.

> Shay, can you come up with a real-life use case when those "I claim the
statement will be used only once" is would indeed improve performance?
> Or, to put it in another way: "do you have a real-life case when simple
protocol is faster than extended protocol with statement reuse"?

> I do have a couple of java applications and it turns out there's a huge
win of reusing server-prepared statements.
> There's a problem that "generic plan after 5th execution might be much
worse than a specific one", however those statements are not often
> and I just put hints to the SQL (limit 0, +0, CTE, those kind of things).

I maintain Npgsql, which is a general-purpose database driver and not a
specific application. The general .NET database API (ADO.NET) like most
(all?) DB APIs, allows users to send a simple statement to the database
(ExecuteReader, ExecuteScalar, ExecuteNonQuery). Every time a user uses
these APIs without preparing, they pay a performance penalty because the
extended protocol has more overhead than the simple one.

Obviously smart use of prepared statements is a great idea, but it doesn't
work everywhere. There are many scenarios where connections are very
short-lived (think about webapps where a pooled connection is allocated
per-request and reset in between), and the extra roundtrip that preparing
entails is too much. There are also many scenarios where you're not
necessarily going to send the same query multiple times in a single
connection lifespan, so preparing is again out of the question. And more
generally, there's no reason for a basic, non-prepared execution to be
slower than it can be.

Of course we can choose a different query to benchmark instead of SELECT 1
- feel free to propose one (or several).

FWIW I do agree at this stage we should probably benchmark a bit more and
see where we can get by optimizing the extended protocol itself (which is a
good idea regardless). If it becomes clear that it's impossible to reach
performance that's similar to the simple query, we can look into adding a
new protocol message (as I think the special-statement-name solution has
generally been rejected). I'm not sure I'll have the time to go into this
in the very near future (and it would be my first time really diving into
PostgreSQL internals) but if nobody else wants to I'll try.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-08-09 08:13:08 Re: Logical Replication WIP
Previous Message Masahiko Sawada 2016-08-09 07:59:31 Re: Logical Replication WIP