Incremental aggregate/rollup strategy advice

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Incremental aggregate/rollup strategy advice
Date: 2019-07-08 07:48:30
Message-ID: CAKqncciZm9qc-6K5x+RF2aP3iate9-YX-TMcDyBCwddvEU4uQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm researching strategies for incrementally updating aggregate/rollup
tables. The problem is how to do so without double-counting changes, and
not skipping changes. I know enough about concurrency issues to ask the
question, but do *not* know enough about the features and details of
Postgres' concurrency management to figure out a 100% reliable solution
without some help. And, with concurrency-related stuff, you're either 100%
right or you're buggy.

And *thanks in advance* to anyone who can help out. I'm not good at writing
short :( I've tried to put in enough detail to get to the point, which is
"how do I find unprocessed records without missing any."

Okay, the setup is that we've got a lot of tables where we would like to do
incremental aggregates. To simplify things, mostly these are INSERT-only
tables, sometimes UPDATE, not worrying about DELETE yet. A couple of
strategies I'd like to avoid:

* Full queries will take too long, and will scale poorly. So, MATERIALIZED
VIEW is unappealing. So, rollup tables as it's possible to update them
incrementally.

* We may have multiple aggregates off the same base data, and may change
them over time. So, putting some kind of flag field in the source table
doesn't really fit.

* I was thinking about a posting/diff/delta/audit-like table, but that's a
pretty "heavy" solution. You need some kind of ON AFTER INSERT/UPDATE
selection-based trigger to push over the data that's needed to update the
aggregates. Which, again, means the source table needs to know what
aggregations are going to take place. Plus, it's just a ton of churn and
extra data...when all of necessary data exists in the source table already.

* I saw one strategy that looks good from the folks at CitusData:
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/

Briefly, they use a bigserial counter which, I guess, is not
transaction-bound so that record insertions have a chronological stamp. 1,
2, 3, etc. This is a design familiar to me from other environments and is
sometimes called a "concurrency ID." In our case, we need to support UPDATE
as well, so I don't think the sequence idea will work (?) To make this more
concrete, here's a simplified table with source data:

CREATE TABLE "error_report" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We've got
distributed sources, so UUIDs for IDs.
"error_name" text NOT NULL DEFAULT false, -- Something we'll
summarize by.
"facility_id" uuid NOT NULL DEFAULT NULL, -- Something we'll
summarize by.
"error_dts" timestamptz NOT NULL DEFAULT NULL, -- Set on the
source machine in UTC
"last_updated_dts" timestamptz NOT NULL DEFAULT NULL); -- Set on Postgres
after INSERT or UPDATE.

The idea is that you have a stable number line as a number or a timestamp.
We use timestamptz and store everything in UTC. Otherwise, it's the same
basic idea as what the CitusData folks said: You have an ever-increasing
number line so that you can mark where you've processed to. This way, you
can fetch unprocessed rows without missing any, without a flag field the
source table, and without an audit table/change queue of any kind. I've
simplified the timestamps below for legibility to spell this out, as it's
the crux of my question about Postgres specifics. And, just pretend that
these rows are all on page 0...I've faked ctid values to make the rows
easier to keep track of.

ctid last_updated_dts
(0,1) 2018-09-25 05:53:00
(0,2) 2018-09-25 05:54:00
(0,3) 2018-09-25 05:55:00
(0,3) 2018-09-25 05:55:00
(0,4) 2018-09-26 02:23:00
(0,5) 2018-09-26 03:14:00
(0,6) 2018-09-26 03:15:00
(0,7) 2018-09-28 05:10:00
(0,8) 2018-09-28 05:14:00
(0,9) 2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00

You need a small utility table to hold details about which records you've
aggregated or processed.

CREATE TABLE "rollup_status" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We use UUIDs,
not necessary here, but it's what we use.
"rollup_name" text NOT NULL DEFAULT false,
"last_processed_dts" timestamptz NOT NULL DEFAULT NULL); -- Marks the last
timestamp processed.

Now imagine that I've got a rollup_status record

rollup_name last_processed_dts
error_name_counts 2018-09-26 02:23:00

If I search for rows that were modified after the "processed until", I get
these:

ctid last_updated_dts
(0,5) 2018-09-26 03:14:00
(0,6) 2018-09-26 03:15:00
(0,7) 2018-09-28 05:10:00
(0,8) 2018-09-28 05:14:00
(0,9) 2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00

And update the max(last_updated_dts) in the rollup_detail record:

rollup_name last_processed_dts
error_name_counts 2018-09-28 05:15:00

So, I got a chunk of the timeline, recorded how far I went, and processed
those records. The beauty part of this technique, if I can get it
implemented correctly, is that this doesn't have to block new records.
While I'm processing those 5 (or 5K), new records can be added onto the end
of error_report and, so long as they have a timestamp later than
rollup_detail.last_processed_dts, I'll find them on the next sweep. And if
I want to add a different rollup? There's no need to do _anything_ to the
error_report table. Imagine two new rows have come in to the error_report
table

(0,11) 2018-09-28 05:17:00
(0,12) 2018-09-28 05:18:00

The new rollup runs, processes all 12 rows, and now there are two
rollup_detail entries:

rollup_name last_processed_dts
error_name_counts 2018-09-28 05:15:00
facility_avgs 2018-09-28 05:18:00

Same timeline, different uses, different last-position-processed stored in
rollup_detail.last_processed_dts.

Okay, this finally brings me to the Postgres-specific question.

* How do I make sure that I don't have transactions commit with earlier
timestamps?*

I've been digging into this and have seen and considered a few things, but
I'm not sure which, if any will work.

* Use an ON AFTER trigger on the selection with a transition table (cool
feature!) to stamp the last_updated_dts with clock_timestamp(). That
function in particular as I've now learned that most timestamp functions in
a trigger return the transaction _start_ time, not _commit_ time. I need
the transaction commit time. Which brings us to the next idea.

* pg_xact_commit_timestamp which, if I understand it correctly, is sort of
an additional, invisible system column that stores the transaction commit
timestamp as a timestamptz. That sounds perfectly matched to my
requirements but:

-- I've failed to find _any_ detailed discussion of this feature, which is
unusual with Postgres.

-- I'm not 100% sure it will work. Do I need some kind of lock notification
or something while going the aggregate to button things down?

-- I'm not wild about invisible columns as they're obscure to anyone else.

-- I'm not wild about bolting an extra 8-bytes onto every row in every
table. And does this field index? I think it must.

-- I've seen mention that the timestamps aren't retained.

I don't need the timestamp as such, it's just meant to order things along
the number line. I'll put a BRIN index on error_report.last_updated_dts
(and such fields in other tables.) I don't think I would need an index on
the pg_xact_commit_timestamp value, and it's not directly supported. But if
it is necessary, I guess you could build one with a cast AT TIME ZONE to
convert the value to something immutable for the index engine.

* The xmin system column. I don't think that this would work. While
transaction IDs always increase, they aren't going to commit in that order.

* The ctid system column. I've seen this mentioned, but I honestly don't
follow how you could use this to reliably detect all new and modified rows.

I will be *very grateful* for any help or suggestions. I'm out over my skis
on some of the details on Postgres specifics, so I *won't* take corrections
the wrong way.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tatsuo Ishii 2019-07-08 08:21:00 Re: Incremental aggregate/rollup strategy advice
Previous Message Michael Paquier 2019-07-08 07:45:23 Re: Active connections are terminated because of small wal_sender_timeout