Re: Big image tables maintenance

From: still Learner <stilllearner23(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Big image tables maintenance
Date: 2018-09-17 19:12:15
Message-ID: CAB3gP9+9+LUzTEV8WejqLkZJ1HLxLzBZMMiyu84_nT+_Tsso0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

On Mon, Sep 17, 2018, 19:28 Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> (limiting this to -admin, cross-posting this to a bunch of different
> lists really isn't helpful)
>
> * still Learner (stilllearner23(at)gmail(dot)com) wrote:
> > I have a 10 TB size table with multiple bytea columns (image & doc)and
> > makes 20TB of DB size. I have a couple of issues to maintain the DB.
>
> *What* are those issues..? That's really the first thing to discuss
> here but you don't ask any questions about it or state what the issue is
> (except possibly for backups, but we have solutions for that, as
> mentioned below).

> 1. I Would like to separate the image column from the 10TB size table,
> > place it in a separate schema. The change should not result in any query
> > change in the application. Is it possible? Doing this it should not
> affect
> > the performance.
>
> How large are these images? PostgreSQL will already pull out large
> column values and put them into a side-table for you, behind the scenes,
> using a technique called TOAST. Documentation about TOAST is available
> here:
>
> https://www.postgresql.org/docs/current/static/storage-toast.htm
> <https://www.postgresql.org/docs/current/static/storage-toast.html>

Image size is restricted in two digit KBs
only, but we have very large volume of data. The main reason to split the
image to different schema is to avoid data loss in future if corruption
occurs on the table. Also maintenance would be easier compared to now. The
Db growth is much faster, I can say 1 Tb per quarter.

> > 2. I can't maintain files on File system as the count is huge, so
> thinking
> > of using any no-sql mostly mongo-DB, is it recommended? Or PostgreSQL
> > itself can handle?
>
> I suspect you'd find that your data size would end up being much, much
> larger if you tried to store it as JSON or in a similar system, and
> you're unlikely to get any performance improvement (much more likely the
> opposite, in fact)
>

We are also considering document management tools.

For these type of huge amount of data,is it advisable to keep the images in
bytea type only or Jsonb( I haven't used yet) is also an option?

> 3. Taking the backup of 20TB data, is big task. Any more feasible solution
> > other than online backup/pg_dump?
>
> Absolutely, I'd recommend using pgBackRest which supports parallel
> online backup and restore. Using pg_dump for a large system like this
> is really not a good idea- your restore time would likely be
> particularly terrible and you have no ability to do point-in-time
> recovery. Using pgBackRest and a capable system, you'd be able to get a
> complete backup of 20TB in perhaps 6-12 hours, with similar time on the
> recovery side. If you wish to be able to recover faster, running a
> replica (as well as doing backups) may be a good idea, perhaps even a
> time-delayed one.
>

Yeah I will try pgBackrest. We are already having time dealy replica.

> > Each image retrieval is
>
> Unfinished thought here..?
>
Sorry, some how I missed to complete. I supposed to say, image rerival
ratio would be 1:10, mean once each image inserted it would be retrieved by
the application more about 10 times for verification and prints etc.

Viewing the current data growth how long I mean till what size I can
survive with this type of flow. In other words, just dont want to survive
but would like build a robust environment.

> > Currently, we are on pg 9.4 and moving to 10.5 soon.
>
> That's definitely a good plan.
>
> Thanks!
>
> Stephen
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Stephen Frost 2018-09-17 19:34:20 Re: Big image tables maintenance
Previous Message Ron 2018-09-17 16:15:29 Re: Segmentation fault postgres 9.6

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2018-09-17 19:34:20 Re: Big image tables maintenance
Previous Message Laurenz Albe 2018-09-17 19:04:46 Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours