From: | Shay Rojansky <roji(at)roji(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Cached/global query plans, autopreparation |
Date: | 2018-02-07 05:27:12 |
Message-ID: | CADT4RqAd_74m6MUbXAPsYzqXG3F6wWVhS_dFJijrfXs7N+QGHQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all.
Various versions of having PostgreSQL caching and/or autopreparing
statement plans have been discussed (
https://www.postgresql.org/message-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com
,
https://www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-e9cf193a74f5%40postgrespro.ru)
without clear conclusions or even an agreement on what might be worthwhile
to implement. I wanted to bring this up again from a PostgreSQL driver
maintainer's perspective (I'm the owner of Npgsql, the open source .NET
driver), apologies in advance if I'm repeating things or I've missed
crucial information. Below I'll describe three relevant issues and what
I've done to deal with them.
When the same statement is rerun, preparing it has a very significant
performance boost. However, in short-lived connection scenarios it's
frequently not possible to benefit from this - think of a typical webapp
which allocates a connection from a pool, run a query and then return the
connection. To make sure prepared statements are used, Npgsql's connection
pool doesn't send DISCARD ALL when a connection is returned (to avoid
wiping out the connections), and maintains an internal table mapping SQL
(and parameter types) to a PostgreSQL statement name. The next time the
application attempts to prepare the same SQL, the prepared statement is
found in the table and no preparation needs to occur. This means that
prepared statements persist across pooled connection open/close, and are
never discarded unless the user uses a specific API. While this works, the
disadvantages are that:
1. This kind of mechanism needs to be implemented again and again, in each
driver:
2. It relies on Npgsql's internal pooling, which can track persistent
prepared statements on physical connections. If an external pool is used
(e.g. pgpool), this isn't really possible.
1. It complicates resetting the session state (instead of DISCARD ALL, a
combination of all other reset commands except DEALLOCATE ALL needs be
sent). This is minor.
The second issue is that many applications don't work directly against the
database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
additional layer is used, there's a good chance that that layer doesn't
prepare in any way, and indeed hide your access to the database API's
preparation method. Two examples from the .NET world is dapper (a very
popular micro-O/RM) and Entity Framework. In order to provide the best
possible performance in these scenarios, Npgsql has an opt-in feature
whereby it tracks how many times a given statement was executed, and once
it passes a certain threshold automatically prepares it. An LRU cache is
then used to determine which prepared statements to discard, to avoid
explosion. In effect, statement auto-preparation is implemented in the
driver. I know that the JDBC driver also implements such a mechanism (it
was actually the inspiration for the Npgsql feature). The issues with this
are:
1. As above, this has to be implemented by every driver (and is quite
complex to do well)
2. There's a possible missed opportunity in having a single plan on the
server, as each connection has its own (the "global plan" option). Many
apps out there send the same statements across many connections so this
seems relevant - but I don't know if the gains outweigh the contention
impact in PostgreSQL.
Finally, since quite a few (most?) other databases include autopreparation
(SQL Server, Oracle...), users porting their applications - which don't
explicitly prepare - experience a big performance drop. It can rightly be
said that porting an application across databases isn't a trivial task and
that adjustments need to be made, but from experience I can say that
PostgreSQL is losing quite a few users to this.
The above issues could be helped by having PostgreSQL cache on its side
(especially the second issue, which is the most important). Ideally, any
adopted solution would be transparent and not require any modification to
applications. It would also not impact explicitly-prepared statements in
any way.
Note that I'm not arguing for any specific implementation on the PostgreSQL
side (e.g. global or not), but just describing a need and hoping to restart
a conversation that will lead somewhere.
(and thanks for reading this overly long message!)
Shay
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-02-07 05:28:04 | Re: Add RANGE with values and exclusions clauses to the Window Functions |
Previous Message | Michael Paquier | 2018-02-07 05:14:42 | Re: Add more information_schema columns |