Re: Correct use of cursors for very large result sets in Postgres

From: Vitalii Tymchyshyn <vit(at)tym(dot)im>
To: John Gorman <jgorman(at)eldocomp(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Correct use of cursors for very large result sets in Postgres
Date: 2017-02-23 03:13:45
Message-ID: CABWW-d3xywucp_rCnTOe4aoUknL0TqmLB1ucDf7v1Khniynt+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

For JDBC there are certain prerequisites for setFetchSize to work, e.g.
using forward only result sets and transactions.

вт, 21 лют. 2017 о 09:06 John Gorman <jgorman(at)eldocomp(dot)com> пише:

> My experience with cursors in PostgreSQL with Java has been to stay away
> from them. We support 2 databases with our product, PostgreSQL (default)
> and SQL Server. While re-encrypting data in a database the application used
> cursors with a fetch size of 1000.
>
>
>
> Worked perfectly on SQL Server and on PostgreSQL until we got to a
> PostgreSQL table with more than 11 million rows. After spending weeks
> trying to figure out what was happening, I realized that when it gets to a
> table with more than 10 million rows for some reason, the cursor
> functionality just silently stopped working and it was reading the entire
> table. I asked another very senior architect to look at it and he came to
> the same conclusion. Because of limited time, I ended up working around it
> using limit/offset.
>
>
>
> Again we are using Java, so the problem could just be in the PostgreSQL
> JDBC driver. Also we were on 9.1 at the time.
>
>
>
> Regards
>
> John
>
>
>
> *From:* pgsql-performance-owner(at)postgresql(dot)org [mailto:
> pgsql-performance-owner(at)postgresql(dot)org] *On Behalf Of *Mike Beaton
> *Sent:* Tuesday, February 21, 2017 6:49 AM
> *To:* pgsql-performance(at)postgresql(dot)org
> *Subject:* Re: [PERFORM] Correct use of cursors for very large result
> sets in Postgres
>
>
>
> Thanks, Tom.
>
> Wouldn't this mean that cursors are noticeably non-optimal even for normal
> data sizes, since the entire data to be streamed from the table is always
> duplicated into another buffer and then streamed?
>
>
>
> > if you want the whole query result at once, why are you bothering with
> a cursor?
>
>
>
> The PostgreSQL docs (
> https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly
> recommend cursors as a way to return a reference to a large result set from
> a function (as I understood, this is recommended precisely as a way to
> avoid tuple-based buffering of the data).
>
>
>
> So following that advice, it's not unreasonable that I would actually have
> a cursor to a large dataset.
>
>
>
> Then, I would ideally want to be able to fetch the data from that cursor
> without the entire data getting duplicated (even if only a bit at a time
> instead of all at once, which seems to be the best case behaviour) as I go.
>
>
>
> Additionally, I thought that if I had a streaming use-case (which I do),
> and a streaming data-access layer (which I do), then since `SELECT * FROM
> large` is absolutely fine, end-to-end, in that situation, then by symmetry
> and the principle of least astonishment `FETCH ALL FROM cursor` might be
> fine too.
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message John Gorman 2017-02-23 12:47:09 Re: Correct use of cursors for very large result sets in Postgres
Previous Message Wes Vaske (wvaske) 2017-02-22 15:36:05 Re: Suggestions for a HBA controller (6 x SSDs + madam RAID10)