From: | "henry(at)visionlink(dot)org" <henry(at)visionlink(dot)org> |
---|---|
To: | Shay Rojansky <roji(at)roji(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Cached/global query plans, autopreparation |
Date: | 2018-02-14 17:38:26 |
Message-ID: | CACOdEDjOeJ41N44eVDcjKHwEt0gxK5Y3ssoeuMM0zpNqcZQ=dA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Coming from a PHP application I have several of the same concerns and
wishes. Given that php can not share any (resources) between requests it
would be impossible to accomplish what you have in .NET. We still prepare
statements though for use in result sets and other loops (ORM driven). I'm
wondering if it is possible to solve this as an extension to postgres? If
Citus could refactor their code into an extension surly this is simple in
comparison. I would be willing to help but PHP has made me soft, so it
will take a bit longer.
Thanks,
Chet Henry
On Tue, Feb 13, 2018 at 10:13 AM, Shay Rojansky <roji(at)roji(dot)org> wrote:
> Hi all,
>
> Was wondering if anyone has a reaction to my email below about statement
> preparation, was it too long? :)
>
> (and sorry for top-posting)
>
> On Tue, Feb 6, 2018 at 9:27 PM, Shay Rojansky <roji(at)roji(dot)org> wrote:
>
>> Hi all.
>>
>> Various versions of having PostgreSQL caching and/or autopreparing
>> statement plans have been discussed (https://www.postgresql.org/me
>> ssage-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com, https://
>> www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-e9cf19
>> 3a74f5%40postgrespro.ru), without clear conclusions or even an agreement
>> on what might be worthwhile to implement. I wanted to bring this up again
>> from a PostgreSQL driver maintainer's perspective (I'm the owner of Npgsql,
>> the open source .NET driver), apologies in advance if I'm repeating things
>> or I've missed crucial information. Below I'll describe three relevant
>> issues and what I've done to deal with them.
>>
>> When the same statement is rerun, preparing it has a very significant
>> performance boost. However, in short-lived connection scenarios it's
>> frequently not possible to benefit from this - think of a typical webapp
>> which allocates a connection from a pool, run a query and then return the
>> connection. To make sure prepared statements are used, Npgsql's connection
>> pool doesn't send DISCARD ALL when a connection is returned (to avoid
>> wiping out the connections), and maintains an internal table mapping SQL
>> (and parameter types) to a PostgreSQL statement name. The next time the
>> application attempts to prepare the same SQL, the prepared statement is
>> found in the table and no preparation needs to occur. This means that
>> prepared statements persist across pooled connection open/close, and are
>> never discarded unless the user uses a specific API. While this works, the
>> disadvantages are that:
>> 1. This kind of mechanism needs to be implemented again and again, in
>> each driver:
>> 2. It relies on Npgsql's internal pooling, which can track persistent
>> prepared statements on physical connections. If an external pool is used
>> (e.g. pgpool), this isn't really possible.
>> 1. It complicates resetting the session state (instead of DISCARD ALL, a
>> combination of all other reset commands except DEALLOCATE ALL needs be
>> sent). This is minor.
>>
>> The second issue is that many applications don't work directly against
>> the database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
>> additional layer is used, there's a good chance that that layer doesn't
>> prepare in any way, and indeed hide your access to the database API's
>> preparation method. Two examples from the .NET world is dapper (a very
>> popular micro-O/RM) and Entity Framework. In order to provide the best
>> possible performance in these scenarios, Npgsql has an opt-in feature
>> whereby it tracks how many times a given statement was executed, and once
>> it passes a certain threshold automatically prepares it. An LRU cache is
>> then used to determine which prepared statements to discard, to avoid
>> explosion. In effect, statement auto-preparation is implemented in the
>> driver. I know that the JDBC driver also implements such a mechanism (it
>> was actually the inspiration for the Npgsql feature). The issues with this
>> are:
>>
>> 1. As above, this has to be implemented by every driver (and is quite
>> complex to do well)
>> 2. There's a possible missed opportunity in having a single plan on the
>> server, as each connection has its own (the "global plan" option). Many
>> apps out there send the same statements across many connections so this
>> seems relevant - but I don't know if the gains outweigh the contention
>> impact in PostgreSQL.
>>
>> Finally, since quite a few (most?) other databases include
>> autopreparation (SQL Server, Oracle...), users porting their applications -
>> which don't explicitly prepare - experience a big performance drop. It can
>> rightly be said that porting an application across databases isn't a
>> trivial task and that adjustments need to be made, but from experience I
>> can say that PostgreSQL is losing quite a few users to this.
>>
>> The above issues could be helped by having PostgreSQL cache on its side
>> (especially the second issue, which is the most important). Ideally, any
>> adopted solution would be transparent and not require any modification to
>> applications. It would also not impact explicitly-prepared statements in
>> any way.
>>
>> Note that I'm not arguing for any specific implementation on the
>> PostgreSQL side (e.g. global or not), but just describing a need and hoping
>> to restart a conversation that will lead somewhere.
>>
>> (and thanks for reading this overly long message!)
>>
>> Shay
>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2018-02-14 17:38:49 | Re: tapeblocks is uninitialized in logtape.c |
Previous Message | Ankit Raj | 2018-02-14 17:21:06 | Re: Google Summer of Code 2018 |