BUG #18380: BUG: notify stops working after LEFT JOIN if enable_mergejoin is ON

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: grothoff(at)gnu(dot)org
Subject: BUG #18380: BUG: notify stops working after LEFT JOIN if enable_mergejoin is ON
Date: 2024-03-06 21:18:24
Message-ID: 18380-c795d2b4033dfca0@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18380
Logged by: Christian Grothoff
Email address: grothoff(at)gnu(dot)org
PostgreSQL version: 16.2
Operating system: Debian GNU/Linux
Description:

We tested with Postgres 14, 15, 16.1 and 16.2; same results.

Overview:
========

We use LISTEN + NOTIFY to implement some long polling. This usually works.
However, we stop getting NOTIFY messages after a LISTENing client issued a
particular SELECT query with a LEFT JOIN in the LISTENing client. Disabling
"enable_mergejoin" fixes the issue. We believe that the "enable_mergejoin"
logic somehow corrupts the notify state (likely an obscure rare memory
corruption issue). The bug is 100% reproduceable on several of our
systems.

How to reproduce:
===============
* Install GNUnet 0.21.0 [or: git.gnunet.org/gnunet.git master]
* Install GNU Taler exchange, checkout tag 'postgres-bug-mergejoin' [from
git.taler.net/exchange.git]
* Install taler-wallet-core [from git.taler.net/taler-wallet-core]
* run $ taler-harness run-integrationtests deposit

Buggy outcome
=============

If enable_mergejoin is ON, the test reports this at the end:
parent: got result
{"name":"deposit","timeSec":35.168,"status":"pass"}
Note that the 35s is due to all kinds of things in the test, none related to
actual postgres query performance.
This is NOT a performance issue. We can set the long-poll threshold higher,
and the time will go up accordingly.

Expected behavior with work-around
==============================

If enable_mergejoin is OFF, long-polling works and the test reports
something like this at the end:
parent: got result
{"name":"deposit","timeSec":16.094,"status":"pass"}

This is basically the fastest the integration test will run (~15-17
seconds), the NOTIFY happens as expected.

Relevant SQL query break-down
==========================

The query that causes things to break is in
exchange/src/exchangedb/pg_lookup_transfer_by_deposit.c. The buggy
sub-clause is marked with
#if BUG below:

PREPARE (pg,
"get_deposit_without_wtid",
"SELECT"
" bdep.wire_salt"
",wt.payto_uri"
",cdep.amount_with_fee"
",denom.fee_deposit"
",bdep.wire_deadline"
#if BUG
",agt.legitimization_requirement_serial_id"
#endif
",aml.status"
",aml.kyc_requirement"
" FROM coin_deposits cdep"
" JOIN batch_deposits bdep"
" USING (batch_deposit_serial_id)"
" JOIN wire_targets wt"
" USING (wire_target_h_payto)"
" JOIN known_coins kc"
" ON (kc.coin_pub = cdep.coin_pub)"
" JOIN denominations denom"
" USING (denominations_serial)"
#if BUG
" LEFT JOIN aggregation_transient agt "
" ON ( (bdep.wire_target_h_payto = agt.wire_target_h_payto)
AND"
" (bdep.merchant_pub = agt.merchant_pub) )"
#endif
" LEFT JOIN aml_status aml"
" ON (wt.wire_target_h_payto = aml.h_payto)"
" WHERE cdep.coin_pub=$1"
" AND bdep.merchant_pub=$3"
" AND bdep.h_contract_terms=$2"
" LIMIT 1;");

If we #define BUG to 0 for the code above, we do NOT have to disable the
merge_join to get the desired behavior.

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-03-06 21:21:23 Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
Previous Message Alexander Korotkov 2024-03-06 20:24:03 Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault