Re: Storing large documents - one table or partition by doc?

From: Dev Nop <devnop0(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Storing large documents - one table or partition by doc?
Date: 2016-09-26 08:27:28
Message-ID: CACjtUOR-YX1NW9iyE=3K0rsAXvwgxHAB9iSjR5WCr3CupfSbxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
> If GUIDs *stored in a binary format* were too large, then you won't be
> terribly happy with the 24 byte per-row overhead in Postgres.

Heh. In this case the ids have a life outside the database in various text
formats.

> What I would look into at this point is using int ranges and arrays to
> greatly reduce your overhead:
> CREATE TABLE ...(
> document_version_id int NOT NULL REFERENCES document_version
> , document_line_range int4range NOT NULL
> , document_lines text[] NOT NULL
> , EXCLUDE USING gist( document_version_id =, document_line_range && )
> );

Thanks! Some new things for me to learn about there. Had to read "Range
Types: Your Life Will Never Be The Same" - lol. https://wiki.postgresql.org/
images/7/73/Range-types-pgopen-2012.pdf

To check I understand what you are proposing: the current version and
history is stored in the same table. Each line is referred to by a
sequential line number and then lines are stored in sequential chunks with
range + array. The gist index is preventing any insert with the same
version & line range. This sounds very compact for a static doc but doesn't
it mean lines must be renumbered on inserts/moves?

On Mon, Sep 26, 2016 at 9:26 AM, Dev Nop <devnop0(at)gmail(dot)com> wrote:

> If GUIDs *stored in a binary format* were too large, then you won't be
>> terribly happy with the 24 byte per-row overhead in Postgres.
>
>
> Heh. In this case the ids have a life outside the database in various text
> formats.
>
>
>> What I would look into at this point is using int ranges and arrays to
>> greatly reduce your overhead:
>> CREATE TABLE ...(
>> document_version_id int NOT NULL REFERENCES document_version
>> , document_line_range int4range NOT NULL
>> , document_lines text[] NOT NULL
>> , EXCLUDE USING gist( document_version_id =, document_line_range && )
>> );
>
>
> Thanks! Some new things for me to learn about there. Had to read "Range
> Types: Your Life Will Never Be The Same" - lol.
> https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf
>
> To check I understand what you are proposing: the current version and
> history is stored in the same table. Each line is referred to by a
> sequential line number and then lines are stored in sequential chunks with
> range + array. The gist index is preventing any insert with the same
> version & line range. This sounds very compact for a static doc but doesn't
> it mean lines must be renumbered on inserts/moves?
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Álvaro Hernández Tortosa 2016-09-26 08:28:54 Re: Millions of tables
Previous Message julyanto SUTANDANG 2016-09-26 06:28:10 Re: Millions of tables