SQLreduce: Reduce verbose SQL queries to minimal examples

Posted on 2022-03-09 by credativ GmbH
Related Open Source

SQLsmith has proven to be an effective tool for finding bugs in different areas in the PostgreSQL server and other products, including security bugs, ranging from executor bugs to segfaults in type and index method implementations.

However, the random queries generated by SQLsmith that trigger some error are most often very large and contain a lot of noise that does not contribute to the error. So far, manual inspection of the query and tedious editing was required to reduce the example to a minimal reproducer that developers can use to fix the problem.

This issue is solved by SQLreduce. SQLreduce takes as input an arbitrary SQL query which is then run against a PostgreSQL server. Various simplification steps are applied, checking after each step that the simplified query still triggers the same error from PostgreSQL. The end result is a SQL query with minimal complexity.

SQLreduce is effective at reducing the queries from original error reports from SQLsmith to queries that match manually-reduced queries.

More details on how it works in a blog post.

Example

In 2018, SQLsmith found a segfault in PostgreSQL running Git revision 039eb6e92f. The reproducer back then was a huge 40-line, 2.2kB query:

select case when pg_catalog.lastval() < pg_catalog.pg_stat_get_bgwriter_maxwritten_clean() then case when pg_catalog.circle_sub_pt( cast(cast(null as circle) as circle), cast((select location from public.emp limit 1 offset 13) as point)) ~ cast(nullif(case when cast(null as box) &> (select boxcol from public.brintest limit 1 offset 2) then (select f1 from public.circle_tbl limit 1 offset 4) else (select f1 from public.circle_tbl limit 1 offset 4) end, case when (select pg_catalog.max(class) from public.f_star) ~~ ref_0.c then cast(null as circle) else cast(null as circle) end ) as circle) then ref_0.a else ref_0.a end else case when pg_catalog.circle_sub_pt( cast(cast(null as circle) as circle), cast((select location from public.emp limit 1 offset 13) as point)) ~ cast(nullif(case when cast(null as box) &> (select boxcol from public.brintest limit 1 offset 2) then (select f1 from public.circle_tbl limit 1 offset 4) else (select f1 from public.circle_tbl limit 1 offset 4) end, case when (select pg_catalog.max(class) from public.f_star) ~~ ref_0.c then cast(null as circle) else cast(null as circle) end ) as circle) then ref_0.a else ref_0.a end end as c0, case when (select intervalcol from public.brintest limit 1 offset 1) >= cast(null as "interval") then case when ((select pg_catalog.max(roomno) from public.room) !~~ ref_0.c) and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end else case when ((select pg_catalog.max(roomno) from public.room) !~~ ref_0.c) and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end end as c1, ref_0.a as c2, (select a from public.idxpart1 limit 1 offset 5) as c3, ref_0.b as c4, pg_catalog.stddev( cast((select pg_catalog.sum(float4col) from public.brintest) as float4)) over (partition by ref_0.a,ref_0.b,ref_0.c order by ref_0.b) as c5, cast(nullif(ref_0.b, ref_0.a) as int4) as c6, ref_0.b as c7, ref_0.c as c8 from public.mlparted3 as ref_0 where true;

SQLreduce can effectively reduce that monster to just this:

SELECT pg_catalog.stddev(NULL) OVER () AS c5 FROM public.mlparted3 AS ref_0

Availability

SQLreduce is open source licensed under the MIT license. The source code is on GitHub: https://github.com/credativ/sqlreduce

Debian/Ubuntu packages for sqlreduce are shipped on apt.postgresql.org.

SQLreduce is an open source product by credativ GmbH.