Re: Understanding the differences between Temporal tables, CDC and Time Series.

From: SQL Padawan <sql_padawan(at)protonmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Understanding the differences between Temporal tables, CDC and Time Series.
Date: 2021-11-10 11:15:59
Message-ID: Ivk4J_pes2viw4ya19FuyEafyzHtybivspMEGRDDZ6mkcWfu9Ht3jpdyJaF9svnSxCEiyOVz8tTNoIivpe9j4RdQi-6NP1Br1cpbc6z-Mvs=@protonmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Peter, and thanks for answering.

> > I'm unclear as to the "philosophical" distinctions here and would be
> > grateful if anybody could explain the diffence(s) between them?

> A time series database contains data records containing a time stamp

> A temporal database contains records that have a timestamp range

OK - so we have two stock "ticker" dbs - one a Time Series db (TS-db) and one a Temporal db (T-db).

All of the code below is on a fiddle here

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=207f38e5c7d6e7861c402c2c4926840c

CREATE TABLE ts_portfolio -- time series
(
txn_id INTEGER NOT NULL,
company TEXT NOT NULL,
price INTEGER NOT NULL,
txn_ts TIMESTAMP NOT NULL
);

with a few records

INSERT INTO ts_portfolio VALUES
(1234, 'DB', 10, '07/11/21 00:12:00'), -- record inserted on 07/11/21
(2345, 'DB', 20, '08/11/21 00:12:00'), -- record inserted on 08/11/21
(3456, 'DB', 30, '09/11/21 00:12:00'), -- record inserted on 09/11/21
(4567, 'DB', 40, '10/11/21 00:12:00'), -- record inserted on 10/11/21
(5678, 'DB', 50, '11/11/21 00:12:00'); -- record inserted on 11/11/21

So, for example, in order to query the share price AS OF 00:00:00 on the 09/11

SELECT company, price
FROM ts_portfolio
WHERE company = 'DB'
AND txn_ts <= '09/11/21 00:00:00'
ORDER BY txn_ts DESC
LIMIT 1;

We pick up the price from the latest transaction either before or exactly on 09/11/2021 00:00:00

Things become tricky when we want to see the prices on a given day: 09/11/2021

--
-- UNION better here? See EXPLAIN in fiddle - not important for this debate!
--

WITH cte1 AS
(
SELECT txn_id AS t_start
FROM ts_portfolio
WHERE company = 'DB'
AND txn_ts <= '09/11/21 00:00:00'
ORDER BY txn_ts DESC
LIMIT 1
), cte2 AS
(
SELECT txn_id AS t_end
FROM ts_portfolio
WHERE company = 'DB'
AND txn_ts <= '09/11/21 23:59:59.999999'
ORDER BY txn_ts DESC
LIMIT 1
)
SELECT
t.txn_id, t.company, t.price
FROM
ts_portfolio t
WHERE txn_id BETWEEN (SELECT t_start FROM cte1) AND (SELECT t_end FROM cte2);

txn_id company price
2345 DB 20
3456 DB 30

which is correct - from 08/11 midday till 09/11 midday, the price was 20 and then at midday 09/11 it rose to 30!

Becoming complex.

However, a Temporal (versioned) table would only require one extra field:

txn_t_from TIMESTAMP NOT NULL,
txn_t_to TIMESTAMP NOT NULL -- extra field - see fiddle for data inserts and updates!

Each INSERT (behind the scenes) is an INSERT and an UPDATE.

INSERT INTO tdb_portfolio VALUES
(1000, 'DB', 5, '01/01/1900', '07/11/21 00:12:00'), -- inserted in the past, updated 07/11 @ 12:00
(1234, 'DB', 10, '07/11/21 00:12:00', '08/11/21 00:12:00'), -- record inserted on 07/11/21 - updated 08/11 @ 12:00
(2345, 'DB', 20, '08/11/21 00:12:00', '09/11/21 00:12:00'), -- record inserted on 08/11/21 - and so on...
(3456, 'DB', 30, '09/11/21 00:12:00', '10/11/21 00:12:00'), -- record inserted on 09/11/21
(4567, 'DB', 40, '10/11/21 00:12:00', '11/11/21 00:12:00'), -- record inserted on 10/11/21
(5678, 'DB', 50, '11/11/21 00:12:00', '19/01/2038'); -- record inserted on 11/11/21 - never updated (effectively INFINITY until next update)

The queries become simpler:

SELECT txn_id, company, price
FROM tdb_portfolio AS OF '09/11/2021 00:00:00';

and the complex one above collapses to:

SELECT txn_id, company, price FROM tdb_portfolio
FOR SYSTEM_TIME BETWEEN ('09/11/2021 00:00:00' AND '09/11/2021 23:59:59.999999');

or maybe if there was a DATE(SYSTEM_TIME) function, ... FOR DATE(SYSTEM_TIME) = '09/11/2021';

I'm assuming that (when properly introduced into PostgreSQL), there will be optimisations for these sorts of query.

Do temporal tables bring anything else "to the party" - the augmented functionality is a nice-to-have, but hardly earth-shattering?

Have I missed out on anything important?

Thx, SQLP!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2021-11-10 16:11:06 Re: E.1.2. Migration to Version 14
Previous Message Peter Eisentraut 2021-11-10 07:53:44 Re: Understanding the differences between Temporal tables, CDC and Time Series.