From: | Sait Talha Nisanci <Sait(dot)Nisanci(at)microsoft(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Metin Doslu <Metin(dot)Doslu(at)microsoft(dot)com> |
Subject: | Crash in record_type_typmod_compare |
Date: | 2021-03-31 08:50:00 |
Message-ID: | AM5PR8303MB00812D04B7184A3377CB7AD3917C9@AM5PR8303MB0081.EURPRD83.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417 tupdesc.c: No such file or directory.
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1 0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2 0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr(at)entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action(at)entry=HASH_ENTER, foundPtr=foundPtr(at)entry=0x7ffcfd3117c0) at dynahash.c:987
#3 0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr(at)entry=0x7ffcfd3117b8, action=action(at)entry=HASH_ENTER, foundPtr=foundPtr(at)entry=0x7ffcfd3117c0) at dynahash.c:911
#4 0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc(at)entry=0x1b9f3f0) at typcache.c:1801
#5 0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6 TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7 0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8 0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9 0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal(at)entry=0x1b8ed00, forward=forward(at)entry=true, count=0, count(at)entry=9223372036854775807, dest=dest(at)entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal(at)entry=0x1b8ed00, count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true, run_once=<optimized out>, dest=dest(at)entry=0x1abfc90, altdest=altdest(at)entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228
This is the issue: https://github.com/citusdata/citus/issues/3825
I think this is related to postgres because of the following events:
* In assign_record_type_typmod<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1984> tupledesc will be set to NULL if it is not in the cache and it will be set to an actual value in this line<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1998>.
* It is possible that postgres will error in between these two lines, hence leaving a NULL tupledesc in the cache. For example in find_or_make_matching_shared_tupledesc<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1988>. (Possibly because of OOM)
* Now there is a NULL tupledesc in the hash table, hence when doing a comparison in record_type_typmod_compare<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1935>, it will crash.
I have manually added a line to error in "find_or_make_matching_shared_tupledesc" and I was able to get a similar crash with two subsequent simple SELECT queries. You can see the backtrace in the issue<https://github.com/citusdata/citus/issues/3825#issuecomment-805627864>.
We should probably do HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974> only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.
Best,
Talha.
From | Date | Subject | |
---|---|---|---|
Next Message | Ajin Cherian | 2021-03-31 09:04:56 | Re: locking [user] catalog tables vs 2pc vs logical rep |
Previous Message | Julien Rouhaud | 2021-03-31 08:30:41 | Re: Issue with point_ops and NaN |