Re: Storing large documents - one table or partition by doc?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Dev Nop <devnop0(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Storing large documents - one table or partition by doc?
Date: 2016-09-26 19:58:50
Message-ID: d3fc965c-80ac-5f65-6ae6-beb342bcd61a@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Please CC the mailing list so others can chime in or learn...

On 9/26/16 3:26 AM, Dev Nop wrote:
> What I would look into at this point is using int ranges and arrays
> to greatly reduce your overhead:
> CREATE TABLE ...(
> document_version_id int NOT NULL REFERENCES document_version
> , document_line_range int4range NOT NULL
> , document_lines text[] NOT NULL
> , EXCLUDE USING gist( document_version_id =, document_line_range && )
> );
>
>
> Thanks! Some new things for me to learn about there. Had to read "Range
> Types: Your Life Will Never Be The Same" - lol.
> https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf
>
> To check I understand what you are proposing: the current version and
> history is stored in the same table. Each line is referred to by a
> sequential line number and then lines are stored in sequential chunks
> with range + array. The gist index is preventing any insert with the
> same version & line range. This sounds very compact for a static doc but

You've got it correct.

> doesn't it mean lines must be renumbered on inserts/moves?

Yes, but based on your prior descriptions I was assuming that was what
you wanted... weren't you basically suggesting storing one line per row?

There's certainly other options if you want full tracking of every
change... for example, you could store every change as some form of a
diff, and only store the full document every X number of changes.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2016-09-27 14:30:11 Re: Millions of tables
Previous Message Tom Lane 2016-09-26 16:52:39 Re: Millions of tables