Quick Links

Re: Search then Delete Performance

From:	Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com>
To:	Michael Hull <mikehulluk(at)googlemail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Search then Delete Performance
Date:	2010-09-15 02:39:19
Message-ID:	AANLkTinR8nDMSnDRA0WJTz0=ijPYbk0d5+HTt22P39nS@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

It's probably slow because you run many queries where a few would work:

DELETE FROM unassignedjobs WHERE jobid IN (6, 8 ,2, 99, 66)

But I wouldn't know how to build a query like that in C. A script in
python or even bash that dit it would be faster than your C
implementation.

What you can do in C is this:

sprintf(
buffer,
"INSERT INTO assignedjobs (jobid,nodeid)\n"
"SELECT jobid, '%s' from unassignedjobs LIMIT %d\n",
nodename.c_str(), number
);

Some smart SQL with some 'RETURNING' clauses could run the whole
assignJobs function in a single query and it would be a lot faster.

Met vriendelijke groet,
Arjen Nienhuis

On Wed, Sep 15, 2010 at 2:55 AM, Michael Hull <mikehulluk(at)googlemail(dot)com> wrote:
> Hi Everyone,
> I am fairly new to practical databases, but I am trying out the c
> interface to postgres and am wondering how to improve performance. I
> am a researcher, and I am trying to perform a large parameter sweep.
> Since this will involve a couple of thousand simulations, I have a
> process that manages which simulations have been done, and which still
> need to be done, so I can run it easily on a cluster.
>
> So, I have a fairly simple schema of 4 tables.
>
> -- na, ca,ks,kf,lk,iinj are the parameters for my simulation.
> CREATE TABLE alljobs (
> id SERIAL,
> ca int,
> na int,
> lk int,
> ks int,
> kf int,
> iinj int,
> PRIMARY KEY(id)
> );
>
>
> CREATE TABLE assignedjobs (
> jobid int,
> nodeid varchar(100),
> assignedtime timestamp,
> PRIMARY KEY(jobid)
> );
>
>
> CREATE TABLE completedjobs (
> jobid int,
> PRIMARY KEY(jobid)
> );
>
> CREATE TABLE unassignedjobs(
> jobid int,
> PRIMARY KEY(jobid)
> );
>
>
>
> alljobs is initially populated, and contains all the simulations that
> will ever be run
> unassignedjobs contains the ids in alljobs that havent been run yet
> assignedjobs contains the ids in alljobs that have been dispatched to
> some cpu on the cluster and are currently simulating
> completedjobs contains all the completed jobs.
>
> So fairly simply, I have a daemon running on a machine, which accesses
> this DB. Clients connect and request the details for say 1000
> simulations, at which point the daemon takes 1000 entries from the
> unassigned table and moves them to the assigned table. The once the
> client is finished with those jobs, it signals this to the daemon,
> which then move those jobs from 'assigned' to 'complete'.
>
> So this is fairly simple to implement, but my problem is that it is very slow.
>
>
> In particular, I have a 'select' waiting for network connections,
> which then calls this function:
>
> typedef vector<long> VectorLong;
> VectorLong assignJobs(PGconn* pDB, int number, string nodename)
> {
> char buffer[1000];
> sprintf(buffer,"SELECT jobid from unassignedjobs LIMIT %d",number);
> PGresult* pRes = PQexec(pDB, buffer);
>
> printf("assigning jobs");
>
> //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs LIMIT 100");
> //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs");
>
> int nRes = PQntuples(pRes);
> printf("Results found: %d",nRes);
>
> VectorLong jobs;
> for(int i=0;i<nRes;i++)
> {
> long id = atol( PQgetvalue(pRes,i,0) );
> cout << id << " ";
> jobs.push_back(id);
>
> sprintf(buffer, "DELETE FROM unassignedjobs WHERE jobid = %ld", id);
> PQexec(pDB, buffer);
>
> sprintf(buffer, "INSERT INTO assignedjobs (jobid,nodeid) VALUES
> (%ld, %s)", id, nodename.c_str() );
> PQexec(pDB, buffer);
> }
>
>
> return jobs;
> }
>
> but it is painfully slow. I was wondering if there is a way to improve
> this? I feel there should be since I already have a 'pointer' to the
> rows I want to delete.
>
>
> Any help would be greatly appreciated.
>
> Many thanks
>
>
> Mike Hull
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Search then Delete Performance at 2010-09-15 00:55:12 from Michael Hull

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2010-09-15 03:15:04	Re: Search then Delete Performance
Previous Message	Scott Marlowe	2010-09-15 02:05:58	Re: workaround steps for autovaccum problem