Re: BRIN index creation on geometry column causes crash

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tobias Wendorff <tobias(dot)wendorff(at)tu-dortmund(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BRIN index creation on geometry column causes crash
Date: 2025-01-03 00:25:54
Message-ID: 9b346570-675c-47a5-ab2b-131bcc80988b@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 1/2/25 23:02, Tobias Wendorff wrote:
> I'm reporting a server crash that occurs when creating a BRIN index on a
> geometry column in PostgreSQL 17.2.
>
> ### Steps to reproduce
> 1. Set up PostgreSQL 17.2 with PostGIS 3.5.1 extension
> 2. Execute the following SQL commands:
>
> ```sql
> DROP TABLE IF EXISTS random_points;
> CREATE TABLE random_points AS
> SELECT ST_MakePoint(0, 0) AS geom FROM generate_series(1, 130_561);
> CREATE INDEX ON random_points USING brin(geom);
> ```
>

Reproduced, but I belive this is actually a long-standing PostGIS bug.
The exact place where it crashes is here:

/* Finally, merge B to A. */
finfo = inclusion_get_procinfo(bdesc, attno, PROCNUM_MERGE);
Assert(finfo != NULL);
result = FunctionCall2Coll(finfo, colloid, ...);

With asserts, it fails on the assert, i.e. finfo is NULL. This means the
opfamily is missing the "MERGE" support procedure, which is however
required (from the very beginning of BRIN in 9.5).

This is supported by amvalidate():

test=# select * from pg_opclass where opcname =
'brin_geometry_inclusion_ops_2d';
oid | opcmethod | opcname | opcnamespace |
opcowner | opcfamily | opcintype | opcdefault | opckeytype
-------+-----------+--------------------------------+--------------+----------+-----------+-----------+------------+------------
17327 | 3580 | brin_geometry_inclusion_ops_2d | 2200 |
10 | 17326 | 16395 | t | 16430
(1 row)

test=# select amvalidate(17327);
INFO: operator family "brin_geometry_inclusion_ops_2d" of access method
brin is missing support function(s) for types box2df and box2df
amvalidate
------------
f
(1 row)

The reason why serial builds work is that the opclass does not add
values through the MERGE procedure, because it defines the procedures
like this:

test=# select amprocnum, amproc from pg_amproc where amprocfamily = 17326;
amprocnum | amproc
-----------+---------------------------------
1 | brin_inclusion_opcinfo
2 | geom2d_brin_inclusion_add_value
3 | brin_inclusion_consistent
4 | brin_inclusion_union
(4 rows)

So it uses a custom geom2d_brin_inclusion_add_value, instead of the
usual brin_inclusion_add_value, simply does the "merge" ad-hoc. Which is
a bit unorthodox (the "traditional way" would be to keep the usual
add_value proc, and add the type-specific logic in the "merge").

The problem however is that it keeps the brin_inclusion_union, which
also uses the MERGE procedure. And so it crashes.

This means even without the parallel builds this opclass is likely
broken, because as soon as it invokes the _union, it crashes in exactly
the same way. And it's not that difficult to hit, really. All it takes
is roughly this with two sessions:

S1: desummarize range 0

SELECT brin_desummarize_range('random_points_geom_idx', 0);

S1: summarize range 0, but break on brin_can_do_samepage_update (in
summarize_range)

SELECT brin_summarize_range('random_points_geom_idx', 0);

S2: insert a tuple into range 0

INSERT INTO random_points SELECT ST_MakePoint(1000, 1000);

S1: continue execution

KABOOOM!

So I think this is a bug in the opclass, which pretends to be an
inclusion opclass, but also tries not to be - and it ends up with an
inconsistent set of procedures. It needs to make it's mind and either
override (at least) the _union too, or add the MERGE required by
inclusion ops.

FWIW this is not the only opclass with this issue - I get a whole bunch
of similar failures (attached).

I belive this needs to be reported to PostGIS. It's a bit too late for
me, so I'll do that tomorrow, unless someone beats me to it.

regards

--
Tomas Vondra

Attachment Content-Type Size
amvalidate-failures.txt text/plain 3.5 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message 谭忠涛 2025-01-03 01:12:03 回复:Re: Incorrect sort result caused by ROLLUP and WHERE operation
Previous Message David Rowley 2025-01-03 00:15:06 Re: BRIN index creation on geometry column causes crash