From: | Gary Cowell <gary(dot)cowell(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | badly scaling performance with appending to bytea |
Date: | 2018-03-21 12:03:17 |
Message-ID: | CAMC0u8kSE3_tmNQqMWj-1oGc0m_aWyjGNXWNWXbtO1hTJHp=jw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
We are trying to implement postgresql code to load a large object into
a postgresql bytea in chunks to avoid loading the file into memory in
the client.
First attempt was to do
update build_attachment set chunk = chunk || newdata ;
this did not scale and got significantly slower after 4000-5000 updates.
The chunks are 4K in size, and I'm testing with a 128MB input file,
requiring 32,774 chunk updates.
Next, I tried creating an aggregate, thus:
(taken from stackoverflow)
CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea);
changed the code to insert the chunks to a temporary table :
create temporary table build_attachment (seq bigserial primary key,
chunk bytea ) on commit drop;
we then insert our 4K chunks to this, which takes very little time (20
seconds for the 32,774 inserts)
Here's an example though of trying to select the aggregate:
gary=> \timing
Timing is on.
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 4000 \g output
Time: 13372.843 ms
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 8000 \g output
Time: 54447.541 ms
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 16000 \g output
Time: 582219.773 ms
So those partial aggregates completed in somewhat acceptable times but ...
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 32000 \g output
this one hadn't completed in an hour - the PostgreSQL connection
process for my connection on the server goes to 100% CPU and stays
there, not using much RAM, not doing much IO, oddly
EXPLAINing these aggregate selects doesn't show anything useful.
Am I doomed to not be able to update a bytea this way? Is there some
way I can tune this?
From | Date | Subject | |
---|---|---|---|
Next Message | Rick Otten | 2018-03-21 12:12:31 | Re: badly scaling performance with appending to bytea |
Previous Message | Pavel Stehule | 2018-03-21 05:24:16 | Re: [HACKERS] proposal: schema variables |