Re: Receiving many more rows than expected

From: Vincent de Phily <vincent(dot)dephily(at)mobile-devices(dot)fr>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Receiving many more rows than expected
Date: 2014-05-08 22:11:51
Message-ID: 2153573.NEZ8etDJMq@moltowork
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thursday 08 May 2014 06:30:39 Adrian Klaver wrote:
> On 05/08/2014 04:09 AM, Vincent de Phily wrote:
> > Hello,
> >
> > I'm processing records in batches using PG 9.1.12, python 2.7, and psycopg
>
> > 2.5.2 :
> Comments in the code below:
> > def enqueue_loop(q):
> > curs = DB_HANDLER.cursor()
> > query = """UPDATE foo SET processing = 't' WHERE id IN
> >
> > (SELECT id FROM foo WHERE processing = 'f' ORDER BY id ASC
> > LIMIT
> >
> > 5000 FOR UPDATE) RETURNING *"""
>
> # Where is this query actually run?

Sorry, I edited my code too much, it's actually executed at the first line of
the loop, I added it below.

> > while DO_LOOP: #the whole program eventually stops if this is false
>
> # What cause DO_LOOP to go false?

Either when receiving a signal from the OS (registered with
"signal.signal(signal.SIGINT, stop_main)") or when the topmost try-catch-
reinitialize-retry loop has caught a quick sucession of exceptions.

DO_LOOP is tested in a few places where we can make a clean exit. A cronjob
will restart the process if it is not or badly running.

> > curs.execute(query)
> > results = curs.fetchall()
> > rlen = len(results)
> >
> > if rlen > 0:
> > LOG.debug("Fetched %d rows", rlen)
>
> # What do you see in LOG for rlen values?

The histogram shows a large amount of small values, progressively becoming
rarer for bigger values, up to value 5000 which is very frequent again
(depending on the day, between 0.5 and 5% of queries return the maximum number
of rows), then a handfull freak values of ~9k, 11k, 15k, 33k, 46k, 107k, etc
which cause my problems.

> > if rlen == 0:
> > # [...] wait for notification...
> > continue
> >
> > # [...] Enqueue batch and let other threads process it.
> > # [...] Those threads will eventually delete the processed rows
> > from
> > # the foo table.
> >
> > The problem is that sometimes (once every few days at about 2-300K queries
> > per day) I get many more rows than the max 5000 I asked for (I've seen up
> > to 25k). And I'm getting timeouts and other problems as a result.
> >
> > The id column is your typical primary key integer with a unique index.
> > I've
> > checked the problematic cases and there are no id gaps or duplicate rows.
> > There are multiple threads in the program, but only the main thread is
> > running enqueue_loop(). I'm not sure if this is a server or a driver
> > issue.
> >
> >
> > Any idea ? Thanks.

--
Vincent de Phily

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-05-08 23:56:25 Re: Receiving many more rows than expected
Previous Message Anand Kumar, Karthik 2014-05-08 20:11:39 Re: Oracle to PostgreSQL replication