From: | Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com> |
---|---|
To: | Michael Hull <mikehulluk(at)googlemail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Search then Delete Performance |
Date: | 2010-09-15 02:39:19 |
Message-ID: | AANLkTinR8nDMSnDRA0WJTz0=ijPYbk0d5+HTt22P39nS@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
It's probably slow because you run many queries where a few would work:
DELETE FROM unassignedjobs WHERE jobid IN (6, 8 ,2, 99, 66)
But I wouldn't know how to build a query like that in C. A script in
python or even bash that dit it would be faster than your C
implementation.
What you can do in C is this:
sprintf(
buffer,
"INSERT INTO assignedjobs (jobid,nodeid)\n"
"SELECT jobid, '%s' from unassignedjobs LIMIT %d\n",
nodename.c_str(), number
);
Some smart SQL with some 'RETURNING' clauses could run the whole
assignJobs function in a single query and it would be a lot faster.
Met vriendelijke groet,
Arjen Nienhuis
On Wed, Sep 15, 2010 at 2:55 AM, Michael Hull <mikehulluk(at)googlemail(dot)com> wrote:
> Hi Everyone,
> I am fairly new to practical databases, but I am trying out the c
> interface to postgres and am wondering how to improve performance. I
> am a researcher, and I am trying to perform a large parameter sweep.
> Since this will involve a couple of thousand simulations, I have a
> process that manages which simulations have been done, and which still
> need to be done, so I can run it easily on a cluster.
>
> So, I have a fairly simple schema of 4 tables.
>
> -- na, ca,ks,kf,lk,iinj are the parameters for my simulation.
> CREATE TABLE alljobs (
> id SERIAL,
> ca int,
> na int,
> lk int,
> ks int,
> kf int,
> iinj int,
> PRIMARY KEY(id)
> );
>
>
> CREATE TABLE assignedjobs (
> jobid int,
> nodeid varchar(100),
> assignedtime timestamp,
> PRIMARY KEY(jobid)
> );
>
>
> CREATE TABLE completedjobs (
> jobid int,
> PRIMARY KEY(jobid)
> );
>
> CREATE TABLE unassignedjobs(
> jobid int,
> PRIMARY KEY(jobid)
> );
>
>
>
> alljobs is initially populated, and contains all the simulations that
> will ever be run
> unassignedjobs contains the ids in alljobs that havent been run yet
> assignedjobs contains the ids in alljobs that have been dispatched to
> some cpu on the cluster and are currently simulating
> completedjobs contains all the completed jobs.
>
> So fairly simply, I have a daemon running on a machine, which accesses
> this DB. Clients connect and request the details for say 1000
> simulations, at which point the daemon takes 1000 entries from the
> unassigned table and moves them to the assigned table. The once the
> client is finished with those jobs, it signals this to the daemon,
> which then move those jobs from 'assigned' to 'complete'.
>
> So this is fairly simple to implement, but my problem is that it is very slow.
>
>
> In particular, I have a 'select' waiting for network connections,
> which then calls this function:
>
> typedef vector<long> VectorLong;
> VectorLong assignJobs(PGconn* pDB, int number, string nodename)
> {
> char buffer[1000];
> sprintf(buffer,"SELECT jobid from unassignedjobs LIMIT %d",number);
> PGresult* pRes = PQexec(pDB, buffer);
>
> printf("assigning jobs");
>
> //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs LIMIT 100");
> //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs");
>
> int nRes = PQntuples(pRes);
> printf("Results found: %d",nRes);
>
> VectorLong jobs;
> for(int i=0;i<nRes;i++)
> {
> long id = atol( PQgetvalue(pRes,i,0) );
> cout << id << " ";
> jobs.push_back(id);
>
> sprintf(buffer, "DELETE FROM unassignedjobs WHERE jobid = %ld", id);
> PQexec(pDB, buffer);
>
> sprintf(buffer, "INSERT INTO assignedjobs (jobid,nodeid) VALUES
> (%ld, %s)", id, nodename.c_str() );
> PQexec(pDB, buffer);
> }
>
>
> return jobs;
> }
>
> but it is painfully slow. I was wondering if there is a way to improve
> this? I feel there should be since I already have a 'pointer' to the
> rows I want to delete.
>
>
> Any help would be greatly appreciated.
>
> Many thanks
>
>
> Mike Hull
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-09-15 03:15:04 | Re: Search then Delete Performance |
Previous Message | Scott Marlowe | 2010-09-15 02:05:58 | Re: workaround steps for autovaccum problem |