Re: managing primary key conflicts while restoring data to table with existing data

From: Krishnakant Mane <kkmane(at)riseup(dot)net>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: managing primary key conflicts while restoring data to table with existing data
Date: 2019-09-26 06:27:55
Message-ID: a6ee63e1-fde9-4d43-43d2-c67b2987d54a@riseup.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 26/09/19 12:03 AM, Adrian Klaver wrote:
> On 9/25/19 8:04 AM, Rob Sargent wrote:
>>
>>
>> On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkmane(at)riseup(dot)net
>> <mailto:kkmane(at)riseup(dot)net>> wrote:
>>
>>>
>>> On 25/09/19 7:50 PM, Adrian Klaver wrote:
>>>> On 9/25/19 12:15 AM, Krishnakant Mane wrote:
>>>>> Hello all,
>>>>>
>>>>> I have been using postgresql for an enterprise quality account's
>>>>> automation and inventory management software called GNUKhata
>>>>> <https://gnukhata.in>
>>>>>
>>>>> Our team is planning to add backup and restore function in the
>>>>> software.
>>>>>
>>>>> But we don't want to dump the entire database and then restore the
>>>>> same.
>>>>>
>>>>> What we are trying to do is to copy data specific to an organization.
>>>>>
>>>>> The challenge here is that I might copy all data (account heads,
>>>>> bills, vouchers etc ) for one organization from an instance on one
>>>>> machine.
>>>>>
>>>>> I take the archive in what ever format to another machine and now
>>>>> attempt to restore.
>>>>>
>>>>> The risk here is for example if the primary key value for orgcode
>>>>> in the organization table is 5, it might conflict with the data
>>>>> where I am attempting it to be restored.
>>>>>
>>>>> Same holds true for bills, invoices etc.
>>>>>
>>>>> A certain account head with accountcode 1 might be already present
>>>>> on the second machine.
>>>>>
>>>>> I am not expecting the users to empty all data from the
>>>>> destination machine before restoring a backup.
>>>>>
>>>>> The reason is that an auditor may have many client's data and one
>>>>> can't predict what primary key values are going to come from a
>>>>> backup.
>>>>>
>>>>> Basically I can even say this is a copy paste instead of a pure
>>>>> backup and restore.
>>>>>
>>>>> Can any one suggest how to handle such conflicts?
>>>>
>>>> Hard to say. If the data is held in common tables(bills, vouchers,
>>>> etc)then the only thing I see happening is changing the PK values
>>>> to an unused value. That could turn into a nightmare though. Not
>>>> only that you lose the connection to the original data source. If
>>>> the data can be broken out into separate tables then I could see
>>>> placing them in their own schema.
>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Regards,
>>>>> Krishnakant Mane,
>>>>> Project Founder and Leader,
>>>>> GNUKhata <https://gnukhata.in/>
>>>>> //(Opensource Accounting, Billing and Inventory Management
>>>>> Software)//
>>>>
>>>
>>> Hi Adrian,
>>>
>>> Even I am thinnking to do some kind of upsert with this situation.
>
> So to be clear the tables you are working can have records from
> multiple organizations in a single table?
>
>>>
>>> And I would have to set the pkey to an unassigned value when there
>>> is conflict.
>
> I am seeing nextval() in your future:)
>
>>>
>>> I may also choose to revamp the serial by timestamps but don't know
>>> if the target customers would like it.
>
> I would avoid that. In my opinion timestamps are to too volatile to
> serve as a PK. If you are going to change I would go with the previous
> suggestion of UUID:
> https://www.postgresql.org/docs/11/datatype-uuid.html
>
> Not sure your customers would like that either.
>
Hi Adrian,

I think I would make them like the uuid idea.

So now what I am thinking is to first revamp the database by first
removing all the primary key constraints and then deleting all the values.

Then loop through the existing data and get uuid in that colum for every
row.

I might also require to update all the references to this value as
foreign key in related tables.

But I guess some kind of on update cascade might do well.

I know this would slow down the system, but given that this will be a
one time process for an individual user (that too if he has existing
data ), I would take that trade-off.

What do you say?

>>>
>>> -- 
>>> Regards,
>>> Krishnakant Mane,
>>> Project Founder and Leader,
>>> GNUKhata <https://gnukhata.in/>
>>> //(Opensource Accounting, Billing and Inventory Management Software)//
>> It would likely be easier to rethink your backup and restore plan.
>> Putting each restore into its own space would be one tack.
>
>
--
Regards,
Krishnakant Mane,
Project Founder and Leader,
GNUKhata <https://gnukhata.in/>
//(Opensource Accounting, Billing and Inventory Management Software)//

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marco Ippolito 2019-09-26 08:10:42 Re: could not accept SSL connection: sslv3 alert bad certificate
Previous Message Ekaterina Amez 2019-09-26 05:51:16 Re: Upgrading old server