Time-based table design / index selection

From: Tony Shelver <tshelver(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Time-based table design / index selection
Date: 2018-10-24 12:10:07
Message-ID: CAG0dhZArSC-jHW2RdWtuf4t+7=MXf2pUkUFqw+UEnZG8yz8hrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I come form a DB background using Oracle (which is quite outdated) and more
recently SQL Server, 2012 and 2016, so would like some ideas from the group.

I have 2 tables which represent a time-based relationship between various
object types, most of which also have a geographical component.
There are other tables involved obviously, but these two represent the core.

Objects can be customer accounts (each of which has it's own reporting
substructure), customer branches, vehicles, telematic / GPS tracking
devices, drivers, managers, geolocations such as geofences (area
boundaries, exclusionary and inclusionary), geopoints / locations, and much
more.
Objects are stored in the to_obj_locn_attrib table.

These objects can be organized hierarchically (vehicles and drivers within
branches, branches within customer account, or at a global level, or both
(geographical locations and areas).
These relationships / relationship types (drivers authorized to drive a
specific vehicle, drivers who have actually driven a vehicle during a
specific period, for example) are stored in the to_reln table.

The objects mutate over time, and we always need to be able to report on
the data at a point in time.

The two main tables in this implementations are to_obj_locn_attrib and
to_reln.
There are other tables containing object data (such as to_object, which is
the core id and immutable portion of the object. :

CREATE TABLE teleon.to_obj_locn_attrib
(
obj_id smallint NOT NULL,
active_dt timestamp without time zone NOT NULL DEFAULT
CURRENT_TIMESTAMP,
active_status character(1) COLLATE pg_catalog."default" NOT NULL
DEFAULT 'A'::bpchar,
user_upd character varying(40) COLLATE pg_catalog."default" NOT NULL,
obj_name character varying(150) COLLATE pg_catalog."default",
geo_type character(1) COLLATE pg_catalog."default",
geo_locn geometry(Polygon,4326),
geo_area character varying COLLATE pg_catalog."default",
time_zone character varying(15) COLLATE pg_catalog."default",
obj_locn_attribs jsonb,
CONSTRAINT to_obj_locn_attr_pk PRIMARY KEY (obj_id, active_dt)
USING INDEX TABLESPACE "TeleTS1",
CONSTRAINT to_object_obj_locn_attr_fk FOREIGN KEY (obj_id)
REFERENCES teleon.to_object (obj_id) MATCH FULL
ON UPDATE CASCADE
ON DELETE RESTRICT,
CONSTRAINT geo_type_check CHECK (geo_type = ANY (ARRAY['P'::bpchar,
'O'::bpchar, 'L'::bpchar, 'R'::bpchar, 'C'::bpchar])),
CONSTRAINT to_obj_locn_active_dt CHECK (active_status = ANY
(ARRAY['A'::bpchar, 'I'::bpchar, 'D'::bpchar]))
)
.
There is a unique btree index on obj_id and active_dt (composite primary
key).
Any changes to the information contained in this table results in a new
record with the same obj_id and a new active_dt being inserted into the
table.
The to_obj_locn_attrib table is relatively static.

CREATE TABLE teleon.to_reln
(
reln_id bigint NOT NULL DEFAULT
nextval('teleon.to_reln_reln_id_seq'::regclass),
reln_typ_id character varying(6) COLLATE pg_catalog."default" NOT NULL,
active_dt timestamp without time zone NOT NULL,
active_status character(1) COLLATE pg_catalog."default" NOT NULL,
user_upd character varying(40) COLLATE pg_catalog."default" NOT NULL,
obj_id_owner_of integer NOT NULL,
obj_id_owned_by integer NOT NULL,
reln_attribs jsonb,
CONSTRAINT to_reln_pk PRIMARY KEY (reln_id)
USING INDEX TABLESPACE "TeleTS1",
CONSTRAINT to_object_owned_by FOREIGN KEY (obj_id_owned_by)
REFERENCES teleon.to_object (obj_id) MATCH FULL
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT to_object_owner_of FOREIGN KEY (obj_id_owner_of)
REFERENCES teleon.to_object (obj_id) MATCH FULL
ON UPDATE CASCADE
ON DELETE RESTRICT,
CONSTRAINT to_reln_type_reln_fk FOREIGN KEY (reln_typ_id)
REFERENCES teleon.to_reln_type (reln_typ_id) MATCH FULL
ON UPDATE NO ACTION
ON DELETE NO ACTION
);

Indexes: Apart from the Id column of reln_id, there is a btree index on
reln_typ_id, obj_id_owner_of, and active_dt. and another on reln_typ_id, ,
obj_id_owned_by and active_dt to cater to the most common queries.

This identifies the specific relationship of a specific relationship type
between two objects at a specific point in time.

A typical query would be something like:
SELECT p.obj_id_owner_of , active_status, ...
FROM teleon.to_reln p
WHERE p.reln_typ_id = 'DPTAST' AND
p.obj_id_owned_by = l_id AND
p.active_dt = (
SELECT MAX(c.active_dt)
FROM teleon.to_reln c
WHERE c.reln_typ_id = p.reln_typ_id AND
c.obj_id_owned_by = p.obj_id_owned_by );

A single object could be involved in many types of relationships with many
other objects,

Considering that this query would form the basis of a recursive query when
extracting hierarchical structures, I would like this to be as efficient as
possible.

Note that I have implemented the type of structure above in a few quite
large projects with millions of objects and relationships quite
successfully, using SQL Server.

One consideration is to implement a date range (actually tsrange) and then
2 GIST or BRIN indexes covering the above attributes of reln_typ_id,
obj_id_owner_of, and active_dt_range. and another on reln_typ_id, ,
obj_id_owned_by and active_dt_range

When changing the ot_reln id, I would update the active_dt_range (tsrange
datatype to close the range at the time of change, and then implement a new
copy of the record with active_dt_range starting at the time of the change,
where applicable.
I would use the && operator to get the current records based on the range.

Another alternative could be to implement the active range using separate
date fields and a btree index to cover the reln_typ_id, obj_id_owner_of,
active_dt_from and active_dt_to. and another on reln_typ_id, ,
obj_id_owned_by, active_dt_from and active_dt_to.

I'd appreciate any input the group can come up with.

Tony

Browse pgsql-general by date

  From Date Subject
Next Message Steve Clark 2018-10-24 13:28:34 no libpq.pc for Centos 6 or Centos 7
Previous Message Jerry Sievers 2018-10-24 09:40:59 Re: Oracle vs PG