From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Robert DiFalco <robert(dot)difalco(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Insert vs Update |
Date: | 2015-07-15 18:15:14 |
Message-ID: | CAKFQuwbo3mvJM5zwY8dHML=7dwpGM9zF4FiXT0yQfYL7-fpr2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, Jul 15, 2015 at 1:56 PM, Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
wrote:
>
>
> On Wed, Jul 15, 2015 at 10:33 AM, David G. Johnston <
> david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
>> On Wednesday, July 15, 2015, Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
>> wrote:
>>
>>> First off I apologize if this is question has been beaten to death. I've
>>> looked around for a simple answer and could not find one.
>>>
>>> Given a database that will not have it's PKEY or indices modified, is it
>>> generally faster to INSERT or UPDATE data. And if there is a performance
>>> difference is it substantial?
>>>
>>
>> This seems odd. If you have an option to update but choose to insert
>> what becomes of the other record?
>>
>
>
> Consider the two pseudo-schemas, I'm just making this up for example
> purposes:
>
> SCHEMA A
> =========
> meal(id SEQUENCE,user_id, started DEFAULT NOW())
> meal_prepared(ref_meal_id, prepared DEFAULT NOW())
> meal_abandoned(ref_meal_id, abandoned ...)
> meal_consumed(ref_meal_id, consumed ...)
> etc.
>
> Then in response to different meal events you always have an insert.
>
> aMealId = INSERT INTO meal(user_id) VALUES (aUserId);
>
> When preparation starts:
>
> INSERT INTO meal_prepared(ref_meal_id) VALUES (aMealId);
>
> And so on for each event.
>
> Compare that to this:
>
> SCHEMA B
> =========
> meal_event(id, started, prepared, abandoned, consumed, ...)
>
> The start of the meal is an INSERT:
>
> aMealId = INSERT INTO meal_event(user_id, started) VALUES (aUserId, NOW());
>
> When preparation starts:
>
> UPDATE meal_event SET prepared = NOW() WHERE id = aMealId;
>
> And so on.
>
> Basically the same data, in one case you always do inserts and add new
> tables for new events. In the other case you only insert once and then
> update for each state, then you add columns if you have new states.
>
> As I said this is just an example. But in SCHEMA A you have only inserts,
> lots of tables and in SCHEMA B you have a lot of updates and a lot of
> possibly NULL columns if certain events don't occur.
>
> Is that more clear?
>
>
Yes, you are trying to choose between a bunch of one-to-one (optional)
relationships versus adding additional columns to a table all of which can
be null.
I'd argue that neither option is "normal" (in the DB normalization sense).
CREATE TABLE meal (meal_id bigserial)
CREATE TABLE meal_event_type (meal_event_id bigserial)
CREATE TABLE meal_event (meal_id bigint, meal_event_id bigint, occurred_at
timestamptz)
So now the decision is one of how to denormalize. materialzed views and
two ways to do so. The specific solution would depend in part on the final
application queries that you need to write.
If you do want to model the de-normalized form, which I would likely be
tempted to do given a fixed set of "events" that do not require additional
related attributes, would be to place the few event timestamps on the main
table and UPDATE them to non-null.
In the normal form you will likely find partial indexes to be quite useful.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert DiFalco | 2015-07-15 19:16:43 | Re: Insert vs Update |
Previous Message | Robert DiFalco | 2015-07-15 17:56:26 | Re: Insert vs Update |