UNION causes horrible plan on JOIN

From: Craig James <cjames(at)emolecules(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: UNION causes horrible plan on JOIN
Date: 2019-10-28 22:40:58
Message-ID: CAFwQ8reoBtrjfBM3rJuFg2LndWu-LHbXsU4knDyrVK+wo8PAug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Postgres 9.6 (config below), I have a case I don't understand: three
tables that can be separately queried in milliseconds, but when put
together into one view using UNION, take 150 seconds to query. Here's the
rough idea (actual details below):

create view thesaurus as
(select id, name from A)
union (select id, name from B)
union (select id, name from C);

create table h(i integer);
insert into h values(12345);
select * from thesaurus join h on (thesaurus.id = h.id);

On the other hand, if you do this, it's a millisecond plan:

select * from thesaurus where id in (12345);

Notice that it's effectively the same query since h above contains just
this one value.

Here are the actual details. The view being queried:

create view thesaurus2 as

select
rt.thesaurus_id,
rt.version_id,
rt.normalized,
rt.identifier,
rt.typecode
from local_sample s
join thesaurus_master rt using (sample_id)
union
select c.id as thesaurus_id,
c.id as version_id,
c.cas_number as normalized,
c.cas_number as identifier,
3 as typecode
from cas_number c
join sample s on c.id = s.version_id
union
select m.id as thesaurus_id,
m.id as version_id,
lower(m.mfcd) as normalized,
m.mfcd as identifier,
4 as typecode
from mfcd m
join sample s on m.id = s.version_id;

The bad sort (147 seconds to execute). Note that the "hitlist" table
contains exactly one row.

explain analyze select c.version_id
from thesaurus2 c
join hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);

https://explain.depesz.com/s/5oRC

If I instead just query directly for that value, the answer is almost
instant (1.2 msec):

explain analyze select c.version_id
from thesaurus2 c
where c.version_id in (1324511991);
https://explain.depesz.com/s/EktF

Now if I take any one of the three tables in the UNION view, the query is
really fast on each one. For example:

select distinct c.version_id

from (
select distinct c.id as thesaurus_id,
c.id as version_id,
c.cas_number as normalized,
c.cas_number as identifier,
3 as typecode
from cas_number c
join sample s on c.id = s.version_id
) c
join hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);

https://explain.depesz.com/s/KJUZ

The other two subqueries are similarly fast.

This is Postgres9.6 running on Ubuntu 16.04, 64GB memory 16 CPUs.
Non-default config values:

max_connections = 2000
shared_buffers = 12073MB
work_mem = 256MB
maintenance_work_mem = 512MB
synchronous_commit = off
effective_cache_size = 32GB
wal_level = logical
wal_keep_segments = 1000
max_wal_senders = 10
hot_standby = on
archive_mode = on
archive_command = '/bin/true'

Thanks!
Craig

--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2019-10-28 22:45:53 Re: UNION causes horrible plan on JOIN
Previous Message Grégoire de Turckheim 2019-10-28 16:35:22 Re: Notifications within triggers seem to compromise performance