From: | John Gorman <jgorman(at)eldocomp(dot)com> |
---|---|
To: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Correct use of cursors for very large result sets in Postgres |
Date: | 2017-02-21 14:06:37 |
Message-ID: | 0A1B0B276DEE6441A2E68EB66D4540DAD71C69E9@WINEX1.eldocomp.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
My experience with cursors in PostgreSQL with Java has been to stay away from them. We support 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.
Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.
Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also we were on 9.1 at the time.
Regards
John
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres
Thanks, Tom.
Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?
> if you want the whole query result at once, why are you bothering with a cursor?
The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).
So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.
Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.
Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2017-02-21 15:04:34 | Re: Suggestions for a HBA controller (6 x SSDs + madam RAID10) |
Previous Message | Pietro Pugni | 2017-02-21 13:49:39 | Suggestions for a HBA controller (6 x SSDs + madam RAID10) |