Re: Index/trigger implementation for accessing latest records

From: Tim Cross <theophilusx(at)gmail(dot)com>
To: Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com>
Cc: "pgsql-general\(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Index/trigger implementation for accessing latest records
Date: 2018-05-02 22:14:55
Message-ID: 876045yaao.fsf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> writes:

> Hi,
>
>
> I have a table that stores a location identifier per person which will be appended to many times.
>
> However, for many queries in this system we only need to know the most recent location per person, which is limited to about 1000 records.
>
>
> Is the following trigger/index strategy a reasonable and safe approach to fast access to the latest location records per person?
>
>
> 1. A boolean column (latest_record default true) to identify the latest record per person
> 2. A before insert trigger that updates all other records for that person to latest_record = false
> 3. A partial index on the latest_record column where latest_record is true
>
>
> Aside from performance, is it safe to update other records in the table from the insert trigger in this way?
>
>
> Minimal example is shown below:
>
>
> create table location_records
>
> (
>
> id bigserial,
>
> person_id bigint,
>
> location_id bigint,
>
> latest_record boolean not null default true
>
> );
>
>
> create function latest_record_update() returns trigger as
>
> $$
>
> BEGIN
>
> update location_records set latest_record = false where person_id = new.person_id and latest_record is true and id != new.id;
>
> return new;
>
> END;
>
> $$ language plpgsql;
>
>
> create trigger latest_record_trigger before insert on location_records
>
> for each row execute procedure latest_record_update();
>
>
> create index latest_record_index on location_records(latest_record) where latest_record is true;
>
>
> insert into location_records(person_id,location_id) values (1,1);
>
> insert into location_records(person_id,location_id) values (1,2);
>
> insert into location_records(person_id,location_id) values (1,3);
>
>
> insert into location_records(person_id,location_id) values (2,3);
>
> insert into location_records(person_id,location_id) values (2,4);
>
>
> select * from location_records;
>

My personal bias will come out here ....

I don't think using a trigger is a good solution here. Although very
powerful, the problem with triggers is that they are a 'hidden' side
effect which is easily overlooked and often adds an additional
maintenance burden which could be avoided using alternative approaches.

Consider a few months down the road and your on holidays. One of your
colleagues is asked to add a new feature which involves inserting
records into this table. During testing, they observe an odd result - a
field changing which according to the SQL they wrote should not. The
simple new feature now takes twice as long to develop as your colleague
works out there is a trigger on the table. Worse yet, they don't notice
and put there changes into production and then issue start getting
raised about communications going to the wrong location for customers
etc.

Triggers often become a lot more complicated than they will initially
appear. In your example, what happens for updates as opposed to inserts?
What happens if the 'new' location is actually the same as a previously
recorded location etc.

In your case, I would try to make what your doing more explicit and
avoid the trigger. There are a number of ways to do this such as

- A function to insert the record. The function could check to see if
that customer has any previous records and if so, set the boolean flag
to false for all existing records and true for the new one. You might
even want to break it up into two functions so that you have one which
just sets the flag based on a unique key parameter - this would
provide a way of resetting the current location without having to do
an insert.

- Use a timestamp instead of a boolean and change your logic to select
the current location by selecting the record with the latest
timestamp.

- Keep the two actions as separate SQL - one to insert a record and one
to set the current location. This has the advantage of making actions
clear and easier to maintain and can be useful in domains where people
move between locations (for example, I've done this for a University
where the data represented the students current address, which would
change in and out of semester periods, but often cycle between two
addresses, their college and their parental home). The downside of
this approach is that applications which insert this information must
remember to execute both SQL statements. If you have multiple
interfaces, this might become a maintenance burden (one of the
advantages of using a DB function).

Tim

--
Tim Cross

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2018-05-02 23:01:01 Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN
Previous Message John McKown 2018-05-02 22:04:17 Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN