BUG #16685: The ecpg thread/descriptor test fails sometimes on Windows

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #16685: The ecpg thread/descriptor test fails sometimes on Windows
Date: 2020-10-24 04:05:07
Message-ID: 16685-d6cd241872c101d3@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16685
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 13.0
Operating system: Ubuntu 20.04
Description:

When running `vcregress ecpgcheck`, sometimes I get:
test thread/descriptor ... stderr FAILED 99 ms

regression.diffs contains:
--- .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr
2019-12-04 16:05:46 +0300
+++ .../src/interfaces/ecpg/test/results/thread-descriptor.stderr
2020-10-20 10:00:34 +0300
@@ -0,0 +1 @@
+SQL error: descriptor "mydesc" not found on line 31

See also:
https://www.postgresql.org/message-id/flat/230799.1603045446%40sss.pgh.pa.us

In descriptor.pgc we have:
30: EXEC SQL ALLOCATE DESCRIPTOR mydesc;
31: EXEC SQL DEALLOCATE DESCRIPTOR mydesc;
So the mydesc descriptor disappeared somehow just after allocation.

`EXEC SQL DEALLOCATE DESCRIPTOR` and `EXEC SQL DEALLOCATE DESCRIPTOR` are
implemented in ECPGallocate_desc and ECPGdeallocate_desc in
ecpglib\descriptor.c, correspondingly, so I looked into the code.

I found that the get_descriptors() function called in ECPGdeallocate_desc
sometimes can return null.
static struct descriptor *
get_descriptors(void)
{
pthread_once(&descriptor_once, descriptor_key_init);
return (struct descriptor *) pthread_getspecific(descriptor_key);
}
pthread_getspecific(key) implemented on Widnows as TlsGetValue(key);

To make the bug reproduction easier, I replaced ecpg_schedule contents with
100 "test: thread/descriptor" lines and ran `vcregress ecpgcheck` in a loop
with 100 iterations. And with such setup it takes just several minutes to
get a failure.

The following debugging code inserted into the ECPGallocate_desc:
+++ b/src/interfaces/ecpg/ecpglib/descriptor.c
@@ -829,6 +829,17 @@ ECPGallocate_desc(int line, const char *name)
}
strcpy(new->name, name);
set_descriptors(new);
+
+ long initialdk = descriptor_key;
+ for (int n = 0; n < 1000; n++) {
+ void *new1 = TlsGetValue(descriptor_key);
+ if (!new1) {
+ DWORD lasterr = GetLastError();
+ fprintf(stdout, "TlsGetValue() returned null on
iteration %d, error: %d, descriptor_key: %d, initial descriptor_key:
%d.\n",
+ n, lasterr, descriptor_key,
initialdk);
+ exit(2);
+ }
+ }
return true;
}
shows on a failure:
TlsGetValue() returned null on iteration 209, error: 0, descriptor_key: 28,
initial descriptor_key: 0.
or
TlsGetValue() returned null on iteration: 369, error: 0, descriptor_key: 28,
initial descriptor_key: 0

So the descriptor_key changed after set_descriptors(new), and following
get_descriptors() would return null as seen on a test failure.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2020-10-24 10:00:00 Re: BUG #16678: The ecpg connect/test5 test sometimes fails on Windows
Previous Message David G. Johnston 2020-10-23 18:06:45 Re: ADD TO UPDATE LIMIT