Re: Questions about Partitioning

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Questions about Partitioning
Date: 2011-04-21 07:00:03
Message-ID: 4DAFD5F3.9060208@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 19/04/11 23:56, Phoenix Kiula wrote:
> While I fix some bigger DB woes, I have learned a lesson. Huge indexes
> and tables are a pain.
>
> Which makes me doubly keen on looking at partitioning.
>
> Most examples I see online are partitioned by date. As in months, or
> quarter, and so on. This doesn't work for me as I don't have too much
> logic required based on time.
>
> The biggest, highest volume SELECT in my database happens through an
> "alias" column. This is an alphanumeric column. The second-biggest
> SELECT happens through the "userid" column -- because many users check
> their account every day.

If user id -> alias and/or alias -> user id lookups are really "hot",
consider moving them to a subtable, so you don't have to worry about
whether to partition by user id or alias, and so that the table is
really small, easily cached, and fast to scan. For example:

CREATE TABLE user_alias (
alias VARCHAR(42) PRIMARY KEY,
user_id integer REFERENCES maintable(id)
);

If you like you can retain the "alias" column in "maintable", making
that a REFERENCE to user_alias(alias) so you force a 1:1 relationship
and don't have to JOIN on user_alias to get alias data for a user. The
downside of that is that the circular/bidirectional reference requires
you to use 'DEFERRABLE INITIALLY DEFERRED' on one or both references to
be able to insert, and that can cause memory use issues if you do really
big batch inserts and deletes on those tables.

> 1. Which column should I partition by -- the "alias" because it's the
> largest contributor of queries? This should be OK, but my concern is
> that when user_id queries are happening, then the data for the same
> user will come through many subtables that are partitioned by "alias"

See above: consider splitting the user-id-to-alias mapping out into
another table.

> 3. If I partition using "a%", "b%" etc up to "z%" as the partition
> condition, is this an issue

It might be worth examining the distribution of your data and
partitioning on constraints that distribute the data better. There'll be
a lot more "c"s than "z"s.

That said, it might not be worth the complexity and you'd have to check
if the constraint exclusion code was smart enough to figure out the
conditions. I don't have much experience with partitioning and have
never tried or tested partitioning on a LIKE pattern.

> 6. Triggers - how do they affect speed?

A constraint is not a trigger, they're different. SELECTs on partitioned
tables are not affected by triggers.

For INSERT, UPDATE and DELETE, where you're redirecting INSERTs into the
parent table into the appropriate partition, then speed might be a
concern. It probably doesn't matter. If you find it to be an issue, then
rather then re-writing the trigger in C, you're probably better off just
INSERTing directly into the appropriate subtable and thus bypassing the
trigger.

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2011-04-21 07:10:09 Re: Needs Suggestion
Previous Message Craig Ringer 2011-04-21 06:45:38 Re: how to force an insert before Raise Exception?