Re: Storing large documents - one table or partition by doc?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Dev Nop <devnop0(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Storing large documents - one table or partition by doc?
Date: 2016-09-24 21:30:06
Message-ID: 649c3d87-0df1-a8e2-49da-bc200bc16e86@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 9/24/16 6:33 AM, Dev Nop wrote:
> This means that the applications are sensitive to the size of ids. A
> previous incarnation used GUIDs which was a brutal overhead for large
> documents.

If GUIDs *stored in a binary format* were too large, then you won't be
terribly happy with the 24 byte per-row overhead in Postgres.

What I would look into at this point is using int ranges and arrays to
greatly reduce your overhead:

CREATE TABLE ...(
document_version_id int NOT NULL REFERENCES document_version
, document_line_range int4range NOT NULL
, document_lines text[] NOT NULL
, EXCLUDE USING gist( document_version_id =, document_line_range && )
);

That allows you to store the lines of a document as an array of values, ie:

INSERT INTO ... VALUES(
1
, '[11-15]'
, '[11:15]={line11,line12,line13,line14,line15}'
);

Note that I'm using explicit array bounds syntax to make the array
bounds match the line numbers. I'm not sure that's a great idea, but it
is possible.

> My nightmares are of a future filled with hours of down-time caused by
> struggling to restore a gargantuan table from a backup due to a problem
> with just one tiny document or schema changes that require disconnecting
> all clients for hours when instead I could ignore best practice, create
> 10k tables and process them iteratively and live in a utopia where I
> never have 100% downtime only per document unavailability.

At some size you'd certainly want partitioning. The good news is that
you can mostly hide partitioning from the application and other database
logic, so there's not a lot of incentive to set it up immediately. You
can always do that after the fact.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2016-09-26 00:20:14 Re: Storing large documents - one table or partition by doc?
Previous Message Dev Nop 2016-09-24 11:33:13 Re: Storing large documents - one table or partition by doc?