Re: avoid update - insert churn

From: Steve Midgley <science(at)misuse(dot)org>
To: Kevin Duffy <kevind0718(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, postgres list <pgsql-sql(at)postgresql(dot)org>
Subject: Re: avoid update - insert churn
Date: 2017-03-06 03:55:24
Message-ID: CAJexoSLeeZbBF8ZiV-i+DYNW_wff=mnCkGQZhFCwgMkKxBKOkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Sun, Mar 5, 2017 at 12:07 PM, Kevin Duffy <kevind0718(at)gmail(dot)com> wrote:

> "PostgreSQL 9.5.4, compiled by Visual C++ build 1800, 64-bit"
>
> some examples here: http://www.postgresqltutorial.com/postgresql-upsert/
>
> many thanks for your swift reply.
>
> KD
>
> On Sun, Mar 5, 2017 at 2:56 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
>
>> On 03/05/2017 10:20 AM, Kevin Duffy wrote:
>>
>>>
>>> Hello All:
>>>
>>> I am looking for suggestions on how to optimize a relatively simple
>>> problem.
>>>
>>> Say I have a group of shoe stores. The store report sales daily and
>>> daily I
>>> calc stats off on the sales numbers. Ie rolling averages & volatility
>>> numbers, with a couple of different look backs.
>>>
>>> I am storing the raw sales figures in one table and the calc'ed stats in
>>> another.
>>>
>>> Here is the interesting part, it is possible and it does happen that a
>>> store will not
>>> report sales for a day or two or prior figures reported will get revised.
>>> There is at max a five day window on this.
>>>
>>> For the prior dates there are two possibilities, I will be updating
>>> a record or inserting a new record in sales table. For the current
>>> date's sales it will be all inserts.
>>> Currently what I do is loop through the data by store and shoe style
>>> attempt an update and if that fails
>>> do an insert.
>>>
>>> Same deal for the stat's table except there are multiple records for
>>> each Store/Shoe style pair.
>>>
>>>
>>> So here is my question/issue: Is this there a better design pattern for
>>> doing this?
>>> I could gather up the data for the prior dates and do one "bulk"
>>> update. But that would
>>> fail for sales that have not been reported todate.
>>> The current date's sales are of course an insert. And I could fix this
>>> to be one "bulk" insert.
>>>
>>> Was just chewing this over in my heading wondering if there is a more
>>> efficient way to this.
>>> Other than attempt update if fail insert. Seems to be a bit of churn to
>>> me.
>>>
>>
>> You do not say what version of Postgres you are using, but if 9.5+
>>
>> https://www.postgresql.org/docs/9.5/static/sql-insert.html
>>
>> "ON CONFLICT Clause
>>
>> The optional ON CONFLICT clause specifies an alternative action to
>> raising a unique violation or exclusion constraint violation error. For
>> each individual row proposed for insertion, either the insertion proceeds,
>> or, if an arbiter constraint or index specified by conflict_target is
>> violated, the alternative conflict_action is taken. ON CONFLICT DO NOTHING
>> simply avoids inserting a row as its alternative action. ON CONFLICT DO
>> UPDATE updates the existing row that conflicts with the row proposed for
>> insertion as its alternative action."
>>
>> It wraps the INSERT/UPDATE as an UPSERT in one command.
>>
>>
>>
>>
>>> But at the end of the day, it do believe Keep It Simple.
>>>
>>> Many thanks for considering this problem.
>>>
>>> KD
>>>
>>>
This seems more like a data modeling problem than a PG SQL issue per se
(though obviously your SQL vocab matters in terms what what/how you
implement). If that's right, I'd suggest taking a look at Ralph Kimball's
book Data Warehouse Toolkit
<https://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247>.
It basically deals with practical issues like this in many ways. Your
summary table is basically a warehouse fact table aggregated against a
couple of dimensions. If you like his approach, you might find that it
eliminates the design problems you're facing and improves performance on
the underlying calculations as well.

I haven't spent much time doing data warehouse work for more than 10 years,
so I'm rusty, but I do remember reading his book and feeling grateful at
all the many hours of time he saved me, and countless errors avoided.

Steve

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Günce Kaya 2017-03-06 11:28:59 parameter type is unknown error
Previous Message Kevin Duffy 2017-03-05 20:07:19 Re: avoid update - insert churn