Re: elog/ereport VS misleading backtrace_function function address

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: elog/ereport VS misleading backtrace_function function address
Date: 2024-08-26 11:09:58
Message-ID: CAKZiRmwjDubAm2LUs-oPye-YRwBr6ALVe-bx5RkpRzQJARi7ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On Tue, May 14, 2024 at 5:36 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, May 14, 2024 at 6:13 AM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > OK I'll try to explain using assembly, but I'm not an expert on this.
> > Let's go to the 1st post, assume we run with backtrace_function set:
>
> I feel like this explanation doesn't really explain very much. I mean,
> the question is not "how is it that adding a nop instruction fixes
> anything?" but "is adding a nop instruction a principled way of fixing
> things, and if so, for what reason?".

Short: IMHO the backtrace() is reliable and cannot be blamed here,
it's just that the __attribute__((cold)) with __builtin_unreachable
prevents backtrace() from retrieving proper data from the stack.

Long: using compiler cold branches with __builtin_unreachable() causes
the compiler to generate specific code that is misleading GNU libc's
backtrace() and tools like addr2line/gdb/objdump. The exact
instruction that causes it is that CALL assembly stores the saved RIP
onto the stack frame. So, if you have something like this (that's from
attached trimmed C semi-reproducer that tries to mimic what PG is
doing with ereport() macro and cold path -- NOTE: not exact match for
instructions, but good enough to see problems related with it):

$ cat bt.c
[..]
int main() {
int r;
printf("starting f(), NOT yet in somemisleading_another_f() !\n");
r = f();
printf("%d\n", r*2);
printf("starting somemisleading_another_f()\n");
r = somemisleading_another_f();
[..]

$ gcc-12 -Wall -rdynamic -g -O2 bt.c -o bt
$ ./bt
starting f(), NOT yet in somemisleading_another_f() !
err_cold(): 3
errstatic(): 3
./bt(dump_current_backtrace+0x35) [0x564fbb25d2d5]
./bt(err_cold+0x2f) [0x564fbb25d11f]
./bt(+0x1133) [0x564fbb25d133]
./bt(+0x1144) [0x564fbb25d144]
./bt(main+0x23) [0x564fbb25d173]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7fba4028624a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fba40286305]
./bt(_start+0x21) [0x564fbb25d1d1]

finishing demo() macro

$ addr2line -e ./bt -f 0x1133
f
/root/bt.c:84

$ addr2line -e ./bt -f 0x1144
somemisleading_another_f
/root/bt.c:84

NOTE here that the code has NOT called somemisleading_another_f() at
all, but backtrace() is going to fetch 0x1144, because such RIP value
was stored by the CALL. In asm, it looks like this, in the brackets
I've put sequence of instructions:

0000000000001125 <f.part.0>:
1125: 55 push %rbp [4]
1126: bf 03 00 00 00 mov $0x3,%edi [5]
112b: 48 89 e5 mov %rsp,%rbp [6]
112e: e8 bd ff ff ff call 10f0 <err_cold> [7 -->
.. backtrace()]
1133: bf 03 00 00 00 mov $0x3,%edi
1138: e8 73 02 00 00 call 13b0 <err_finish>

000000000000113d <f.cold>:
113d: 31 c0 xor %eax,%eax [2]
113f: e8 e1 ff ff ff call 1125 <f.part.0> [3 -->
this is going to store 0x113f+4b = 0x1144 as saved RIP onto stack
frame]

0000000000001144 <somemisleading_another_f.cold>:
1144: 31 c0 xor %eax,%eax
1146: e8 da ff ff ff call 1125 <f.part.0>
114b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

[..]
00000000000013e0 <f>:
[..]
13ed: e8 7e fc ff ff call 1070 <getpid(at)plt>
13f2: a8 01 test $0x1,%al
13f4: 0f 85 43 fd ff ff jne 113d <f.cold> [1]

So here, just like in PostgreSQL, in the f.cold the "CALL"@0x113f is
going to save RIP of the following instruction (0x1144), but that's
from a different (next) function in address layout. So when
backtrace() is going to retrieve that stack frame, it's going to see
something belonging to a function that could not even be called
physically. The thing is that path is never going to return (and
that's also caused by usage of __builtin_unreachable()), so for the
compiler there's no need to generate any other instruction after such
CALL. There's no such need, of course unless you want to have valid
backtrace() output. Storing literally any instruction after such CALL,
causes the savedRIP in the stack frame to point back to the proper
origin function.

So again, with on fresh PG18 today (compiled with: ./configure
--prefix=/usr/pgsql18 ), reproducer gives shows location error as
get_collation_isdeterministic() in psql:

psql:ri-collation-bug-example.sql:42: ERROR: cache lookup failed for
collation 0
psql:ri-collation-bug-example.sql:44: error: ERROR: XX000: cache
lookup failed for collation 0
LOCATION: get_collation_isdeterministic, lsyscache.c:1062 // \errverbose

it's evident that backtrace_functions produces garbage:

2024-08-26 10:00:22.317 CEST [39311] STATEMENT: delete from revisions
where id='5c617ce7-688d-4bea-9d66-c0f0ebc635da';
2024-08-26 10:00:45.216 CEST [39325] ERROR: cache lookup failed for collation 0
2024-08-26 10:00:45.216 CEST [39325] BACKTRACE:
postgres: postgres test1 [local] DELETE(+0x155898) [0x55e9161be898]
postgres: postgres test1 [local]
DELETE(RI_FKey_cascade_del+0x2a9) [0x55e9165c2479]
postgres: postgres test1 [local] DELETE(+0x2d7e1f) [0x55e916340e1f]
[..]

so what's the 0x155898 ? We get:
$ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x155898
0x0000000000155898
get_constraint_type.cold
lsyscache.c:?

That's wrong, we should get "get_collation_isdeterministic". Now with
the patch backtrace_functions produces:

2024-08-26 10:10:26.151 CEST [49283] ERROR: cache lookup failed for collation 0
2024-08-26 10:10:26.151 CEST [49283] BACKTRACE:
postgres: postgres test2 [local] DELETE(+0x16964f) [0x55dc64f4064f]
postgres: postgres test2 [local]
DELETE(RI_FKey_cascade_del+0x2a9) [0x55dc65343559]
[..]
2024-08-26 10:10:26.151 CEST [49283] STATEMENT: delete from revisions
where id='5c617ce7-688d-4bea-9d66-c0f0ebc635da';
$ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x16964f
0x000000000016964f
get_collation_isdeterministic.cold
lsyscache.c:?

> I think the right answer here is probably what Alvaro said, namely, that people have to have the debug symbols installed if they want to get backtraces.

Yes, the binary cannot be strip(1)-ed (ELF symbols table '.symtab'
cannot be emptied) for it to work OR the binary can be striped, but
then the debug symbols need to be installed as add-on in order to
decode the addresses, but that's standard thing today and not such
installation is not a problem in general.

Fun fact: one thing worth mentioning here is that I was plain wrong:
it is NOT --enable-debug that is causing - or not - generating .cold
functions. Those seem to be almost always mostly generated on modern
gcc compilers. E.g. Debian official PGDG packages come without ".cold"
function optimizations VISIBLE within the binary by default when using
normal objdump(1). objdump(1) was unable to resolve anything to have
".cold", but following assembly, it looked like those stubs were
present. Now using e.g. objdump --dwarf
--disassemble=get_collation_oid.cold (the key here is to use --dwarf),
I was able to get those function names with .cold . The --dwarf switch
actually starts reading the external ("/usr/lib/debug/.build-id/")
symbols and confirms they are there.

At first I thought that clang does not seem to be affected by this
when dealing with my toy bt.c (actually clang fixes it!), it looks
like it is not generating those ".cold" stubs at all. However the
issue is still the wrong output in PG even with clang ! (I've got the
wrong function output in log too, and again there was CALL as last
instruction, so it saved the wrong RIP of the follow-up instruction
too!). My org. patch does not fix it there, as sadly clang optimized
out my volatile char. To my disgust, what helped there was the below
thing:

--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -145,8 +145,11 @@ struct Node;
- if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
+ if (__builtin_constant_p(elevel) && (elevel) >= ERROR) { \
+ volatile char fix_backtrace_addr = 0x42; \
+ fix_backtrace_addr = fix_backtrace_addr + 1; \
pg_unreachable(); \

It's kind of ugly hack, maybe some other hackers have better ideas.
Also I have not checked different archs that the x86_64 (and part of
the problem is that it needs a CPU-agnostic operand).

-J.

Attachment Content-Type Size
bt.c text/x-c-code 2.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2024-08-26 11:24:03 Re: Adding OLD/NEW support to RETURNING
Previous Message Amit Kapila 2024-08-26 11:05:02 Re: Introduce XID age and inactive timeout based replication slot invalidation