Re: very slow largeobject transfers through JDBC

From: Mate Varga <m(at)matevarga(dot)net>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: Dmitry Igrishin <dmitigr(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: very slow largeobject transfers through JDBC
Date: 2018-09-03 19:57:28
Message-ID: CAK4GaZ6FvsR26MMb=pi1e9DtvtEgW481yogp+-0Xq_bS9CpXZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks,
1) we'll try to move stuff out from LOBs
2) we might raise a PR for the JDBC driver

Mate

On Mon, 3 Sep 2018, 19:35 Dave Cramer, <pg(at)fastcrypt(dot)com> wrote:

>
>
> On Mon, 3 Sep 2018 at 13:00, Mate Varga <m(at)matevarga(dot)net> wrote:
>
>> More precisely: when fetching 10k rows, JDBC driver just does a large
>> bunch of socket reads. With lobs, it's ping-pong: one read, one write per
>> lob...
>>
>>
> Ok, this is making more sense. In theory we could fetch them all but since
> they are LOB's we could run out of memory.
>
> Not sure what to tell you at this point. I'd entertain a PR if you were
> motivated.
>
> Dave Cramer
>
> davec(at)postgresintl(dot)com
> www.postgresintl.com
>
>
>
>>
>> On Mon, Sep 3, 2018 at 6:54 PM Mate Varga <m(at)matevarga(dot)net> wrote:
>>
>>> So I have detailed profiling results now. Basically it takes very long
>>> that for each blob, the JDBC driver reads from the socket then it creates
>>> the byte array on the Java side. Then it reads the next blob, etc. I guess
>>> this takes many network roundtrips.
>>>
>>> On Mon, Sep 3, 2018 at 5:58 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>
>>>>
>>>> On Mon, 3 Sep 2018 at 10:48, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>
>>>>> That's 1690 msec (1.69 seconds, and that is how long it takes to fetch
>>>>> 20k (small-ish) rows without LOBs (LOBs are a few lines below on the
>>>>> screenshot)
>>>>>
>>>>
>>>> that sound high as well!
>>>>
>>>> Something isn't adding up..
>>>>
>>>>
>>>> Dave Cramer
>>>>
>>>> davec(at)postgresintl(dot)com
>>>> www.postgresintl.com
>>>>
>>>>
>>>>
>>>>>
>>>>> On Mon, Sep 3, 2018 at 4:40 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>>>
>>>>>> the one you have highlighted ~1.69ms
>>>>>>
>>>>>> Dave Cramer
>>>>>>
>>>>>> davec(at)postgresintl(dot)com
>>>>>> www.postgresintl.com
>>>>>>
>>>>>>
>>>>>> On Mon, 3 Sep 2018 at 10:38, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>
>>>>>>> Which frame do you refer to?
>>>>>>>
>>>>>>> On Mon, Sep 3, 2018 at 3:57 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>>>>>
>>>>>>>> Not sure why reading from a socket is taking 1ms ?
>>>>>>>>
>>>>>>>> Dave Cramer
>>>>>>>>
>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>> www.postgresintl.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 3 Sep 2018 at 09:39, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> https://imgur.com/a/ovsJPRv -- I've uploaded the profiling info
>>>>>>>>> (as an image, sorry). It seems this is a JDBC-level problem. I understand
>>>>>>>>> that the absolute timing is not meaningful at all because you don't know
>>>>>>>>> how large the resultset is, but I can tell that this is only a few
>>>>>>>>> thousands rows + few thousand largeobjects, each largeobject is around 1
>>>>>>>>> kByte. (Yes I know this is not a proper use of LOBs -- it's a legacy db
>>>>>>>>> structure that's hard to change.)
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Mate
>>>>>>>>>
>>>>>>>>> On Mon, Sep 3, 2018 at 11:52 AM Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> we'll try to test this with pure JDBC versus hibernate. Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 3, 2018 at 11:48 AM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 3 Sep 2018 at 03:55, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Basically there's a class with a byte[] field, the class is
>>>>>>>>>>>> mapped to table T and the byte field is annotated with @Lob so it goes to
>>>>>>>>>>>> the pg_largeobject table.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ah, so hibernate is in the mix. I wonder if that is causing some
>>>>>>>>>>> challenges ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The DB is on separate host but relatively close to the app, and
>>>>>>>>>>>> I can reproduce the problem locally as well. One interesting bit is that
>>>>>>>>>>>> turning of SSL between the app and PSQL speeds up things by at least 50%.
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, one addition -- the binary objects are encrypted, so their
>>>>>>>>>>>> entropy is very high.
>>>>>>>>>>>>
>>>>>>>>>>>> Any chance you could write a simple non-hibernate test code to
>>>>>>>>>>> time the code ?
>>>>>>>>>>>
>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>
>>>>>>>>>>> dave(dot)cramer(at)crunchydata(dot)ca
>>>>>>>>>>> www.crunchydata.ca
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Mate
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Sep 2, 2018 at 12:55 AM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, 31 Aug 2018 at 10:15, Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see -- we could try that, though we're mostly using an ORM
>>>>>>>>>>>>>> (Hibernate) to do this. Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Aug 31, 2018 at 3:57 PM Dmitry Igrishin <
>>>>>>>>>>>>>> dmitigr(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> пт, 31 авг. 2018 г. в 16:35, Mate Varga <m(at)matevarga(dot)net>:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > we're fetching binary data from pg_largeobject table. The
>>>>>>>>>>>>>>> data is not very large, but we ended up storing it there. If I'm copying
>>>>>>>>>>>>>>> the data to a file from the psql console, then it takes X time (e.g. a
>>>>>>>>>>>>>>> second), fetching it through the JDBC driver takes at least 10x more. We
>>>>>>>>>>>>>>> don't see this difference between JDBC and 'native' performance for
>>>>>>>>>>>>>>> anything except largeobjects (and bytea columns, for the record).
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Does anyone have any advice about whether this can be
>>>>>>>>>>>>>>> tuned or what the cause is?
>>>>>>>>>>>>>>> I don't know what a reason of that, but I think it's
>>>>>>>>>>>>>>> reasonable and
>>>>>>>>>>>>>>> quite simple to call lo_import()/lo_export() via JNI.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Can't imagine that's any faster. The driver simply implements
>>>>>>>>>>>>> the protocol
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have any code to share ? Any other information ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is the JDBC connection significantly further away network wise
>>>>>>>>>>>>> ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>>
>>>>>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>>>>>> www.postgresintl.com
>>>>>>>>>>>>>
>>>>>>>>>>>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Joseph Krogh 2018-09-03 20:52:31 How to get shorter SERVER_VERSION in psql-prompt?
Previous Message Rob Sargent 2018-09-03 19:19:34 Re: WARNING: could not flush dirty data: Function not implemented