Re: very slow largeobject transfers through JDBC

From: Mate Varga <m(at)matevarga(dot)net>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: Dmitry Igrishin <dmitigr(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: very slow largeobject transfers through JDBC
Date: 2018-09-06 07:46:15
Message-ID: CAK4GaZ6nangGBLFedFHcfffBPa_BD4t843y3P7CnqhN4jvxOmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

After inlining the data, performance issues have been solved. Thanks for
the help.

On Mon, Sep 3, 2018 at 9:57 PM Mate Varga <m(at)matevarga(dot)net> wrote:

> Thanks,
> 1) we'll try to move stuff out from LOBs
> 2) we might raise a PR for the JDBC driver
>
> Mate
>
> On Mon, 3 Sep 2018, 19:35 Dave Cramer, <pg(at)fastcrypt(dot)com> wrote:
>
>>
>>
>> On Mon, 3 Sep 2018 at 13:00, Mate Varga <m(at)matevarga(dot)net> wrote:
>>
>>> More precisely: when fetching 10k rows, JDBC driver just does a large
>>> bunch of socket reads. With lobs, it's ping-pong: one read, one write per
>>> lob...
>>>
>>>
>> Ok, this is making more sense. In theory we could fetch them all but
>> since they are LOB's we could run out of memory.
>>
>> Not sure what to tell you at this point. I'd entertain a PR if you were
>> motivated.
>>
>> Dave Cramer
>>
>> davec(at)postgresintl(dot)com
>> www.postgresintl.com
>>
>>
>>
>>>
>>> On Mon, Sep 3, 2018 at 6:54 PM Mate Varga <m(at)matevarga(dot)net> wrote:
>>>
>>>> So I have detailed profiling results now. Basically it takes very long
>>>> that for each blob, the JDBC driver reads from the socket then it creates
>>>> the byte array on the Java side. Then it reads the next blob, etc. I guess
>>>> this takes many network roundtrips.
>>>>
>>>> On Mon, Sep 3, 2018 at 5:58 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>>
>>>>>
>>>>> On Mon, 3 Sep 2018 at 10:48, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>
>>>>>> That's 1690 msec (1.69 seconds, and that is how long it takes to
>>>>>> fetch 20k (small-ish) rows without LOBs (LOBs are a few lines below on the
>>>>>> screenshot)
>>>>>>
>>>>>
>>>>> that sound high as well!
>>>>>
>>>>> Something isn't adding up..
>>>>>
>>>>>
>>>>> Dave Cramer
>>>>>
>>>>> davec(at)postgresintl(dot)com
>>>>> www.postgresintl.com
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On Mon, Sep 3, 2018 at 4:40 PM Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>>>>
>>>>>>> the one you have highlighted ~1.69ms
>>>>>>>
>>>>>>> Dave Cramer
>>>>>>>
>>>>>>> davec(at)postgresintl(dot)com
>>>>>>> www.postgresintl.com
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 3 Sep 2018 at 10:38, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>>
>>>>>>>> Which frame do you refer to?
>>>>>>>>
>>>>>>>> On Mon, Sep 3, 2018 at 3:57 PM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Not sure why reading from a socket is taking 1ms ?
>>>>>>>>>
>>>>>>>>> Dave Cramer
>>>>>>>>>
>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>> www.postgresintl.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 3 Sep 2018 at 09:39, Mate Varga <m(at)matevarga(dot)net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> https://imgur.com/a/ovsJPRv -- I've uploaded the profiling info
>>>>>>>>>> (as an image, sorry). It seems this is a JDBC-level problem. I understand
>>>>>>>>>> that the absolute timing is not meaningful at all because you don't know
>>>>>>>>>> how large the resultset is, but I can tell that this is only a few
>>>>>>>>>> thousands rows + few thousand largeobjects, each largeobject is around 1
>>>>>>>>>> kByte. (Yes I know this is not a proper use of LOBs -- it's a legacy db
>>>>>>>>>> structure that's hard to change.)
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> Mate
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 3, 2018 at 11:52 AM Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey,
>>>>>>>>>>>
>>>>>>>>>>> we'll try to test this with pure JDBC versus hibernate. Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:48 AM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 3 Sep 2018 at 03:55, Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Basically there's a class with a byte[] field, the class is
>>>>>>>>>>>>> mapped to table T and the byte field is annotated with @Lob so it goes to
>>>>>>>>>>>>> the pg_largeobject table.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, so hibernate is in the mix. I wonder if that is causing
>>>>>>>>>>>> some challenges ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The DB is on separate host but relatively close to the app,
>>>>>>>>>>>>> and I can reproduce the problem locally as well. One interesting bit is
>>>>>>>>>>>>> that turning of SSL between the app and PSQL speeds up things by at least
>>>>>>>>>>>>> 50%.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah, one addition -- the binary objects are encrypted, so their
>>>>>>>>>>>>> entropy is very high.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any chance you could write a simple non-hibernate test code to
>>>>>>>>>>>> time the code ?
>>>>>>>>>>>>
>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>
>>>>>>>>>>>> dave(dot)cramer(at)crunchydata(dot)ca
>>>>>>>>>>>> www.crunchydata.ca
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Mate
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Sep 2, 2018 at 12:55 AM Dave Cramer <pg(at)fastcrypt(dot)com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, 31 Aug 2018 at 10:15, Mate Varga <m(at)matevarga(dot)net>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see -- we could try that, though we're mostly using an ORM
>>>>>>>>>>>>>>> (Hibernate) to do this. Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Aug 31, 2018 at 3:57 PM Dmitry Igrishin <
>>>>>>>>>>>>>>> dmitigr(at)gmail(dot)com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> пт, 31 авг. 2018 г. в 16:35, Mate Varga <m(at)matevarga(dot)net>:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > we're fetching binary data from pg_largeobject table. The
>>>>>>>>>>>>>>>> data is not very large, but we ended up storing it there. If I'm copying
>>>>>>>>>>>>>>>> the data to a file from the psql console, then it takes X time (e.g. a
>>>>>>>>>>>>>>>> second), fetching it through the JDBC driver takes at least 10x more. We
>>>>>>>>>>>>>>>> don't see this difference between JDBC and 'native' performance for
>>>>>>>>>>>>>>>> anything except largeobjects (and bytea columns, for the record).
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Does anyone have any advice about whether this can be
>>>>>>>>>>>>>>>> tuned or what the cause is?
>>>>>>>>>>>>>>>> I don't know what a reason of that, but I think it's
>>>>>>>>>>>>>>>> reasonable and
>>>>>>>>>>>>>>>> quite simple to call lo_import()/lo_export() via JNI.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can't imagine that's any faster. The driver simply implements
>>>>>>>>>>>>>> the protocol
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you have any code to share ? Any other information ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is the JDBC connection significantly further away network
>>>>>>>>>>>>>> wise ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dave Cramer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> davec(at)postgresintl(dot)com
>>>>>>>>>>>>>> www.postgresintl.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dave Page 2018-09-06 07:57:30 Re: How to install pgAgent on windows for postresql-bigsql-10.5
Previous Message Thiemo Kellner 2018-09-06 06:28:37 Re: PostgreSQL intenal scheduler?