Re: badly scaling performance with appending to bytea

From: Rick Otten <rottenwindfish(at)gmail(dot)com>
To: Gary Cowell <gary(dot)cowell(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: badly scaling performance with appending to bytea
Date: 2018-03-21 12:12:31
Message-ID: CAMAYy4+E9-0+Zee7ed9H6QB_kqp2oW8Q=QY8c7Pywinp0eSBnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Can you use a materialized view to do the bytea_agg() and then refresh
concurrently whenever you need updated data?
The refresh concurrently might take a few hours or days to run to keep the
matview up to date, but your queries would be pretty fast.

A possible problem is that you are running out of memory, so the larger
queries are going to disk. If you can set up temp space on a faster
volume, or bump up your memory configuration it might help.
ie, work_mem, shared_buffers, and file system cache could all play into
larger aggregations running faster.

On Wed, Mar 21, 2018 at 8:03 AM, Gary Cowell <gary(dot)cowell(at)gmail(dot)com> wrote:

> We are trying to implement postgresql code to load a large object into
> a postgresql bytea in chunks to avoid loading the file into memory in
> the client.
>
> First attempt was to do
>
> update build_attachment set chunk = chunk || newdata ;
>
> this did not scale and got significantly slower after 4000-5000 updates.
>
> The chunks are 4K in size, and I'm testing with a 128MB input file,
> requiring 32,774 chunk updates.
>
> Next, I tried creating an aggregate, thus:
>
> (taken from stackoverflow)
>
> CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea);
>
> changed the code to insert the chunks to a temporary table :
>
> create temporary table build_attachment (seq bigserial primary key,
> chunk bytea ) on commit drop;
>
> we then insert our 4K chunks to this, which takes very little time (20
> seconds for the 32,774 inserts)
>
> Here's an example though of trying to select the aggregate:
>
> gary=> \timing
> Timing is on.
> gary=> select bytea_agg(chunk order by seq) from build_attachment
> where seq < 4000 \g output
> Time: 13372.843 ms
> gary=> select bytea_agg(chunk order by seq) from build_attachment
> where seq < 8000 \g output
> Time: 54447.541 ms
> gary=> select bytea_agg(chunk order by seq) from build_attachment
> where seq < 16000 \g output
> Time: 582219.773 ms
>
> So those partial aggregates completed in somewhat acceptable times but ...
>
> gary=> select bytea_agg(chunk order by seq) from build_attachment
> where seq < 32000 \g output
> this one hadn't completed in an hour - the PostgreSQL connection
> process for my connection on the server goes to 100% CPU and stays
> there, not using much RAM, not doing much IO, oddly
>
> EXPLAINing these aggregate selects doesn't show anything useful.
>
> Am I doomed to not be able to update a bytea this way? Is there some
> way I can tune this?
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2018-03-21 12:56:24 Re: badly scaling performance with appending to bytea
Previous Message Gary Cowell 2018-03-21 12:03:17 badly scaling performance with appending to bytea