plpython bug

From: joel(dot)traf(at)magwerks(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: plpython bug
Date: 2017-02-05 21:51:58
Message-ID: 20170205215158.AD118E06FA@smtp.hushmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Developers

found an interesting bug that causes a complete crash and resetting of
all connections to PG 9.5.0 compiled Visual C++ 1800, 64 bit
log states server process (PID 2720) was terminated by exception
0XC0000005
failed process was running: select upsert_items2(false)
Hint: see C include file ntstatus.h for a description of hexadecimal
value (this error is a memory access error)
log: Terminating any other active server processes

Upsert_items2:: Is a function that copies/updates data from ERP
database to a Website database. It just iterates over all the items,
figures out how to group items together based on class, decides if the
item should be seen on the website based on rules, Nothing
complicated.

As these are two different PG databases, the plpython connects to the
erp database via psycopg2 copying the data into local plpython tuple
result set. I then just executed the plpy.execute("insert on
conflict do update") for each item. processing time was around 5
minutes to go through 12,000 items updating 5 tables on the website
database.

The users want this to be faster. The only solution i could think of
was to cut down on all context switching and data type mapping that
happens when calling plpy.execute() for each item. So decided to just
build a really big SQL update command then at the end plpython
function call plpy.execute().

That worked and processing time was reduced from 5 minutes to 8
seconds. As this is work in process I started adding in more rules
that change what gets updated and i had a couple of typo's in the
SQL statements.

That's when i crashed the PG server hard. Its taken me a few hours to
figure out were my mistake was, as this crash will only occur if a SQL
error is very deep in the command sent via plpy.execute(). If there
is an error in the first few lines it returns plpython SPI error.
How deep into the SQL command an error has to be appear to cause this
access error i do not know.

The length of the command sent via plpy.execute() is 1,454,561
characters long

Any suggestions on how to narrow down at what point a bad command
causes this error??
FYI only
The first write of this function was written in plpgsql using DbLink
which took 15 minutes to process the same data-set, second write used
FDW processing time was still 15 minutes, in both cases it spent all
the time getting data from the remote PG database doing executing
Fetchs.
Thanks

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message rob stone 2017-02-06 00:20:13 Re: Parser failed to return an error
Previous Message Tom Lane 2017-02-04 15:56:56 Re: postgres client connection issue