Re: storing large files in database - performance

From: Eric Hill <Eric(dot)Hill(at)jmp(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: storing large files in database - performance
Date: 2017-05-18 12:34:47
Message-ID: CY1PR05MB22654CB64EEFBA6DCB1AA95CF0E40@CY1PR05MB2265.namprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I would be thrilled to get 76 MB per second, and it is comforting to know that we have that as a rough upper bound on performance. I've got work to do to figure out how to approach that upper bound from Node.js.

In the meantime, I've been looking at performance on the read side. For that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. I ran this query, where indexFile.contents for the row in question is 25MB in size. The query itself took 4 seconds in pgAdmin 4. Better than the 12 seconds I'm getting in Node.js, but still on the order of 6MB per second, not 76. Do you suppose pgAdmin 4 and I are doing similarly inefficient things in querying bytea values?

Thanks,

Eric

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure(at)gmail(dot)com]
Sent: Wednesday, May 17, 2017 10:21 AM
To: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: storing large files in database - performance

On Tue, May 16, 2017 at 9:51 AM, Thomas Kellerer <spam_eater(at)gmx(dot)net> wrote:
> John R Pierce schrieb am 16.05.2017 um 16:44:
>> On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
>>> When my (JDBC based) SQL client and the database server are on the same computer...
>>
>> node.js is Javascript, not java w/ jdbc
>
> I know that.
>
> I mentioned JDBC so that it's clear that the timings were done using a
> different technology
>
> Maybe it's Node.js or the JavaScript "driver" that causes the problems.

When writing large objects to the database, method of transmission will very much determine performance until you start hitting the natural boundaries imposed by the database.

via (hastily written):
#include "libpqtypes.h"
#include "stdlib.h"
#include "string.h"

int main()
{
int s = 1024 * 1024 * 256;

char *p = malloc(s);
memset(p, 'x', s);
p[s-1] = 0;

PGconn *conn = PQconnectdb("");
PQinitTypes(conn);

PGresult *res = PQexecf(conn, "insert into foo values(1,%text)", p);

if(!res)
fprintf(stderr, "*ERROR: %s\n", PQgeterror());

PQclear(res);
}

mmoncure(at)mernix2 09:13 AM /tmp$ gcc -otest test.c -lpq -lpqtypes -I /home/mmoncure/src/libpqtypes-1.5.1/src -I /home/mmoncure/pg94/include/ -L /home/mmoncure/src/libpqtypes-1.5.1/.libs/
mmoncure(at)mernix2 09:13 AM /tmp$ psql -c "create table foo(i int, f text)"
CREATE TABLE
mmoncure(at)mernix2 09:13 AM /tmp$ psql -c "alter table foo alter f set storage external"
ALTER TABLE
mmoncure(at)mernix2 09:14 AM /tmp$ time
LD_LIBRARY_PATH=/home/mmoncure/src/libpqtypes-1.5.1/.libs ./test

real 0m3.245s
user 0m0.092s
sys 0m0.232s
mmoncure(at)mernix2 09:15 AM /tmp$ psql -c "select pg_size_pretty(pg_table_size('foo'))"
pg_size_pretty
----------------
266 MB
(1 row)

...that's over 76mb/sec (to local server) for 256mb transfer. That's
pretty good I think. We have a 1GB barrier on bytea/text and (at
least in C, with certain reasonable precautions) you can work comfortably under that limit. There might be other better strategies but it can be done.

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rakesh Mamidala 2017-05-18 12:46:59 Sql server to Postgres Migration issue!
Previous Message Francisco Olarte 2017-05-18 06:49:46 Re: PSQL command line print speed