From: | andy <andy(at)squeakycode(dot)net> |
---|---|
To: | Kynn Jones <kynnjo(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Populating large DB from Perl script |
Date: | 2007-11-02 20:33:20 |
Message-ID: | 472B8990.4050700@squeakycode.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Kynn Jones wrote:
>> So... not really an answer (other than I used a stored proc)
>
> Actually, I'm interested in your solution.
>
> Just to make sure I understood what you did: you bulk-populated (i.e.
> with $dbh->do('COPY...'), $dbh->pg_putline(...), $dbh->pg_endcopy) the
> referring tables, with their fkey constraints disabled; then you ran
> stored procedure(s) that went through these referring tables and
> filled in the missing fkeys; and finally you activated their fkey
> constraints. Is this right?
>
> I'm very much of a stored procedures dunce, so if the code for your
> stored procedure is "postable", please do.
>
> TIA,
>
> kj
>
> PS: As an aside to the list, as a programmer, when I'm starting out in
> language, I learn more than I can say from reading source code written
> by the experts, but for some reason I have had a hard time coming
> across expertly written PostgreSQL stored procedures, other than the
> occasionally didactic snippet in the docs. All these expertly-written
> procedures seem to be very STORED away indeed! If, on the contrary,
> it's just the case that I haven't looked in the right places, please
> hurl me a cluebrick!
I'm afraid it was nothing that super. My scripts where to insert a
bunch of test data into a db so I could play with some very large db's.
I have a people table, and a car table. I wanted to add ownership of
cars to people, but thought nobody should own more than 5 cars (to make
sure my "random" generator didnt bunch too many cars per person)
My perl looks like:
my $carcount = new BerkeleyDB::Btree( -Filename => 'carcount.dbm',
-Flags => DB_CREATE ) or die "Cannot open file: $!";
<..later...>
if (rand() > 0.10) {
my $person = int(rand($maxperson));
my $ok = 1;
$i = 0;
$carcount->db_get($person, $i);
if ($i >= 5)
{
#shall we allow more than 5 cars?
if (rand() > 0.90) {
$ok = 1;
} else {
$ok = 0;
}
}
if ($ok)
{
... do the insert ...
}
So I was keeping personid => carcount map. This worked great, as I
said, until I got into the gigbytes size for the BerkeleyDB.
Instead I created a stored proc:
CREATE OR REPLACE FUNCTION addowner(xpersonid integer, xcarid integer)
returns void AS $$
declare
cc integer;
begin
select into cc count(*)
from ownership
where personid = xpersonid;
if cc < 5 then
insert into ownership(personid, carid) values (xpersonid, xcarid);
end if;
return;
end
$$ LANGUAGE plpgsql;
... and here is another stored proc I wrote for our website. We have
company shirts n'stuff that employee's can order. So the webpage is
kinda like a shopping cart thing.
create or replace function updatecart(xuserid integer, xgroupid integer,
xprodid integer, xsizeid integer, xcolorid integer, xqty integer)
returns void as $$
declare
xid integer;
begin
select into xid rowid from vcs_ordertable where userid = xuserid and
groupid = xgroupid and prodid = xprodid and sizeid = xsizeid and colorid
= xcolorid;
if not found then
insert into vcs_ordertable(userid, groupid, prodid, sizeid,
colorid, quant) values (xuserid, xgroupid, xprodid, xsizeid, xcolorid,
xqty);
else
update vcs_ordertable set quant = quant + xqty where rowid = xid;
end if;
return;
end
$$ LANGUAGE plpgsql;
When you select a shirt (including the size, color, etc) I only wanted
one row per (user, group, productid, size and color), if one already
exists in the table, I just bump its count, if not I insert it.
These two procs are about as complex as I've needed to get.
-Andy
From | Date | Subject | |
---|---|---|---|
Next Message | Jeffrey W. Baker | 2007-11-02 20:49:27 | setting for maximum acceptable plan cost? |
Previous Message | Dawid Kuroczko | 2007-11-02 18:57:59 | Re: Populating large DB from Perl script |