Inaccurate description of UNION/CASE/etc type selection

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Inaccurate description of UNION/CASE/etc type selection
Date: 2020-08-16 21:26:40
Message-ID: 1019930.1597613200@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1]. The documentation about this [2] says
that the relevant resolution rules are:

5. Choose the first non-unknown input type which is a preferred type in
that category, if there is one.

6. Otherwise, choose the last non-unknown input type that allows all the
preceding non-unknown inputs to be implicitly converted to it. (There
always is such a type, since at least the first type in the list must
satisfy this condition.)

But what select_common_type() actually does is:

else if (!pispreferred &&
can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
!can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))
{
/*
* take new type if can coerce to it implicitly but not the
* other way; but if we have a preferred type, stay on it.
*/
pexpr = nexpr;
ptype = ntype;
pcategory = ncategory;
pispreferred = nispreferred;
}

(ptype is the currently selected common type, ntype is the next
input type to consider, and we've already eliminated cases involving
UNKNOWN.)

In the reported case, we have ptype = "name", ntype = "text", and there
are implicit coercions in both directions so we stay with "name" even
though it's not preferred.

So, the step-5 claim that we always choose a preferred type over other
types is just wrong. Step 6 is much short of truthful as well, since
it fails to describe the check about coercion in the other direction.
(Also, we're not really checking that *every* earlier argument can be
promoted to ntype, only the currently best one. Typically, if there's
an implicit coercion from A to B and also one from B to C, there'd be
one from A to C too; but there are lots of counterexamples.)

Now, this code is old enough to vote, so I think changing its behavior
is probably a really bad idea. I did experiment with giving preferred
types fractionally more preference, like this:

else if (can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
(nispreferred > pispreferred ||
(!pispreferred &&
!can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))))

but this broke a couple of regression test cases, so I'm sure it'd
break real-world queries too. So I think we need to leave the code
alone and fix the docs to describe it more accurately.

However, I'm having a hard time coming up with wording that describes
this accurately without being a verbatim statement of the algorithm.
(I see that I already made one attempt at improving the description,
back in 07daff63c, but it's clearly still not good enough.)

Any ideas?

regards, tom lane

[1] https://www.postgresql.org/message-id/CAOwYNKYfKPfAL4rgP0AO_w0Mn7h8yiXd_Qi9swPdAc4CAUXeAQ%40mail.gmail.com
[2] https://www.postgresql.org/docs/current/typeconv-union-case.html

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message David G. Johnston 2020-08-17 00:05:19 Re: Inaccurate description of UNION/CASE/etc type selection
Previous Message Tom Lane 2020-08-15 16:38:47 Re: PostgreSQL 12: Cryptic documentation