Re: Should I use table partitions

From: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
To: lance(at)illinois(dot)edu
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Should I use table partitions
Date: 2018-10-09 14:54:25
Message-ID: CAODZiv5--ONmpH=YbgExjG1djv8_xPDcUCDYrrxxk+sG2mDgtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, Oct 9, 2018 at 10:43 AM Campbell, Lance <lance(at)illinois(dot)edu> wrote:

> Keith,
>
> Thanks so much for your response. I just sent a post that has more
> information on my indexes and how I select data.
>
>
>
> I am glad you mentioned data retention. Each day early in the morning I
> delete data from this table that is older than 18 months. I now see that I
> should probably vacuum the table after deleting the data. That way any new
> information that is added is all grouped together.
>
>
>
> Your comments were very insightful.
>
>
>
> Thanks,
>
>
>
> Lance
>
>
>
> *From: *Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
> *Date: *Tuesday, October 9, 2018 at 9:35 AM
> *To: *Lance Campbell <lance(at)illinois(dot)edu>
> *Cc: *"pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
> *Subject: *Re: Should I use table partitions
>
>
>
>
>
> On Tue, Oct 9, 2018 at 10:19 AM Campbell, Lance <lance(at)illinois(dot)edu>
> wrote:
>
> PostgreSQL 10.x
>
>
>
> Question:
>
> Should I use table partitions if my table meets the below parameters? The
> size of the table should remain relatively consistent in size and number of
> rows. I would not expect it to vary more than 15% at any given time.
>
>
>
> Info:
>
> Table rows: 83,051,164
>
> Table size only: 25.6 G
>
> Total size of table including indexes: 38.6 G
>
> Table definition:
>
>
>
> CREATE TABLE xyz.abc
>
> (
>
> fk_def_id integer NOT NULL,
>
> id integer NOT NULL DEFAULT
> nextval(('xyz.abc_id_seq'::text)::regclass),
>
> email_address text,
>
> created_timestamp timestamp with time zone DEFAULT now(),
>
> updated_timestamp timestamp with time zone DEFAULT now(),
>
> provider text,
>
> server_id integer DEFAULT 0,
>
> read_count integer DEFAULT 0,
>
> fk_group_id integer DEFAULT 0,
>
> mail_log_timestamp timestamp with time zone DEFAULT now(),
>
> mail_log_process_code integer DEFAULT '-1'::integer,
>
> mail_log_message text DEFAULT,
>
> mail_log_mail_id text DEFAULT '',
>
> mail_log_rule text DEFAULT '',
>
> is_listserv boolean DEFAULT false,
>
> is_bad_email_address boolean DEFAULT false,
>
> bad_email_address_message text DEFAULT '',
>
> finder text DEFAULT '',
>
> constituent_id text DEFAULT '',
>
> CONSTRAINT abc_pkey PRIMARY KEY (id)
>
> );
>
>
>
> Thanks,
>
>
>
> Lance Campbell
>
>
>
>
>
> The big, main reason for partitioning in PostgreSQL is if you have data
> retention that has to be managed. It makes having to deal with bloat a lot
> easier since you don't have to do bulk deletes that may never be filled
> with new data again. If you have no data retention being done now, I
> wouldn't worry about it until you start getting closer to 1 billion rows or
> more. That's when vacuuming run times may start to be a concern and
> partitioning can help reduce those since the individual objects are smaller
> to manage. Also, if there's a lot of static data, and you're on 9.6+, you
> can greatly improve vacuum times by running a VACUUM FREEZE on the table.
> That added a feature to allow vacuum to skip pages that are fully frozen.
>
>
>
> Query read performance is really a secondary concern for partitioning in
> PG, especially if you're not managing data retention. Btree lookups are FAR
> more efficient than constraint exclusion (partition pruning in PG11+) will
> ever be. And if you're only doing more narrow data lookups, you'll actually
> see a performance decrease in queries since removing partitions in the
> planner adds overhead. That overhead isn't as big of a concern if there's
> data retention to manage, though, since bloat will be a bigger impact than
> the planner overhead. If you're grabbing larger blocks of data at one time,
> that's when partitioning may start helping with reads more since it has to
> pull fewer pages from disk.
>
>
>
>
> --
>
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com
>

Note that a normal VACUUM will not re-arrange the data on disk to group it
together. Only a VACUUM FULL will do something like that and even then it
doesn't really group similar data close together. You're probably thinking
of CLUSTER. And the clustering is not automatically retained and has to be
re-run periodically. But both VACUUM FULL and CLUSTER require a full lock
on the table involved that blocks all access until they are done running.

I wrote more about managing bloat in my blog and what vacuuming actually
does if you're interested -
https://www.keithf4.com/checking-for-postgresql-bloat/

Before going down the partitioning road, I would first recommend getting
some bloat monitoring in place, to see if it's even a concern. If you're
removing the data fairly often and not in huge chunks at a time, your
deletion/ingest rate may be balanced out. If you are seeing bloat becoming
a concern, I'd recommend the smallest interval at monthly. You may even
want to try yearly since your retention period is longer. The fewer
partitions there are, the better.

Keith

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Campbell, Lance 2018-10-09 15:38:23 Re: Should I use table partitions
Previous Message Campbell, Lance 2018-10-09 14:43:37 Re: Should I use table partitions