Help with row estimate problem

From: Jon Zeppieri <zeppieri(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Help with row estimate problem
Date: 2024-07-29 20:51:40
Message-ID: CAKfDxxxBiNK_EtZ=d1sqSBf4y0nKHwtx7ooN6sYT1k=n+4FC8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I'm trying to come up with an efficient query, ideally without having
to coerce the planner into doing what I want, but I'm running up
against a problem with row estimates, and I'm curious if there's a
better way to address the problem than what I've come up with.

Here's a straightforward version of the query:

SELECT
v.patient_class,
count( DISTINCT v.id ) AS num_visits
FROM
patient_visits AS v
JOIN patients AS p ON p.id = v.patient_id
JOIN members AS m ON m.patient_id = p.id
JOIN user_populations AS up ON up.population_id = m.population_id
JOIN users AS u ON u.id = up.user_id
JOIN sponsor_populations AS sp ON sp.population_id = m.population_id AND
sp.sponsor_id = u.sponsor_id
WHERE
u.id = 3962 AND
v.start_on < ( m.last_activity_on + sp.authz_expiration_interval ) AND
v.end_on IS NULL
GROUP BY
v.patient_class;

The plan looks like this:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

GroupAggregate (cost=5666.32..5666.42 rows=5 width=12) (actual
time=22239.478..22244.339 rows=5 loops=1)
Group Key: v.patient_class
Buffers: shared hit=20466857
-> Sort (cost=5666.32..5666.34 rows=7 width=12) (actual
time=22236.049..22239.030 rows=50988 loops=1)
Sort Key: v.patient_class, v.id
Sort Method: quicksort Memory: 3528kB
Buffers: shared hit=20466857
-> Nested Loop (cost=2.94..5666.22 rows=7 width=12) (actual
time=1.759..22203.441 rows=50988 loops=1)
Join Filter: (v.patient_id = p.id)
Buffers: shared hit=20466857
-> Nested Loop (cost=2.37..5659.62 rows=11 width=28)
(actual time=1.743..21892.354 rows=50988 loops=1)
Join Filter: (v.start_on < (m.last_activity_on +
sp.authz_expiration_interval))
Rows Removed by Join Filter: 19368
Buffers: shared hit=20204287
-> Nested Loop (cost=1.80..3616.14 rows=2285
width=28) (actual time=0.065..3322.440 rows=2968729 loops=1)
Join Filter: (m.population_id = up.population_id)
Buffers: shared hit=2053968
-> Nested Loop (cost=1.11..11.73 rows=1
width=73) (actual time=0.041..0.286 rows=9 loops=1)
Join Filter: (u.sponsor_id = sp.sponsor_id)
Buffers: shared hit=67
-> Nested Loop (cost=0.70..9.09
rows=1 width=95) (actual time=0.028..0.164 rows=9 loops=1)
Buffers: shared hit=31
-> Index Only Scan using
index_user_populations_on_user_id_and_population_id on
user_populations up (cost=0.42..1.57 rows=3 width=36) (actual
time=0.015..0.025 rows=9 loops=1)
Index Cond: (user_id = 3962)
Heap Fetches: 0
Buffers: shared hit=4
-> Index Scan using
index_sponsor_populations_on_population_id_and_sponsor_id on
sponsor_populations sp (cost=0.28..2.50 rows=1 width=59) (actual
time=0.011..0.012 rows=1 loops=9)
Index Cond:
(population_id = up.population_id)
Buffers: shared hit=27
-> Index Scan using users_pkey on
users u (cost=0.41..2.63 rows=1 width=24) (actual time=0.011..0.011
rows=1 loops=9)
Index Cond: (id = 3962)
Buffers: shared hit=36
-> Index Only Scan using
index_members_on_population_id on members m (cost=0.70..3173.12
rows=34503 width=40) (actual time=0.023..324.991 rows=329859 loops=9)
Index Cond: (population_id = sp.population_id)
Heap Fetches: 2729516
Buffers: shared hit=2053901
-> Index Scan using
idx_new_patient_visits_unique on patient_visits v (cost=0.57..0.88
rows=1 width=24) (actual time=0.006..0.006 rows=0 loops=2968729)
Index Cond: (patient_id = m.patient_id)
Filter: (end_on IS NULL)
Rows Removed by Filter: 4
Buffers: shared hit=18150319
-> Index Only Scan using patients_pkey on patients p
(cost=0.57..0.59 rows=1 width=8) (actual time=0.006..0.006 rows=1
loops=50988)
Index Cond: (id = m.patient_id)
Heap Fetches: 12947
Buffers: shared hit=262570
Settings: effective_cache_size = '372GB', max_parallel_workers =
'12', max_parallel_workers_per_gather = '4', work_mem = '164MB',
random_page_cost = '1.1', effective_io_concurrency = '200'
Planning:
Buffers: shared hit=76
Planning Time: 68.881 ms
Execution Time: 22244.426 ms
(50 rows)

=====

Now, this plan isn't bad, despite the order-of-magnitude
underestimation of how many rows it will need to look at in
index_members_on_population_id. But I'm still hoping to get a faster
query.

There are, broadly speaking, two constraints on which patient_visits
rows will be counted:
- The end_on must be NULL, indicating that the visit is still in progress.
- The visit must be accessible by the user, meaning:
- The visit's patient must have a member from at least one of the
user's populations, such that the visit starts before the user's
access to that member expires.

Unfortunately, neither of these conditions is independently super
selective. There are a lot of visits (many that the user cannot
access) where end_on is NULL (1,212,480) and even more visits (many
already ended) that the user can access (4,167,864). However, there
are only 25,334 rows that meet both conditions. I don't see any way of
taking advantage of this fact without denormalizing some of the
authorization-related data into the patient_visits table.

So the approach I'm considering is to add a column,
patient_visits.cached_population_ids, which is an array of the visit's
patient's member's population_ids. It's a necessary (though
insufficient) condition of a user being able to access a
patient_visits row that this array have a non-empty intersection with
an array of the user's own population_ids (drawn from the
user_populations table). This array column has an index:
"index_new_patient_visits_on_open_cached_population_ids" gin
(cached_population_ids) WHERE end_on IS NULL

So the new query looks like this:

with user_member_expirations as (
select
u.id user_id,
m.id member_id,
m.patient_id,
(m.last_activity_on + sp.authz_expiration_interval) expires_on
from members m
join sponsor_populations sp on sp.population_id = m.population_id
join user_populations up on up.population_id = m.population_id
join users u on u.id = up.user_id
where sp.sponsor_id = u.sponsor_id
)
select v.patient_class, count(distinct v.id)
from new_patient_visits v
join user_member_expirations u on u.patient_id = v.patient_id
where u.user_id = 3962
and v.end_on is null
and v.cached_population_ids &&
ARRAY['foo-population','bar-population','baz-population','quux-population']
and v.start_on < u.expires_on
group by v.patient_class;

[Note: I've replaced the actual population ids with fake ones, since
they are textual and name actual facilities.]

The plan, however, is disappointing:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=5220.41..5220.43 rows=1 width=12) (actual
time=21341.629..21346.488 rows=5 loops=1)
Group Key: v.patient_class
Buffers: shared hit=17585790
-> Sort (cost=5220.41..5220.41 rows=1 width=12) (actual
time=21338.148..21341.128 rows=50988 loops=1)
Sort Key: v.patient_class, v.id
Sort Method: quicksort Memory: 3528kB
Buffers: shared hit=17585790
-> Nested Loop (cost=2.37..5220.40 rows=1 width=12) (actual
time=1.385..21299.416 rows=50988 loops=1)
Join Filter: (v.start_on < (m.last_activity_on +
sp.authz_expiration_interval))
Rows Removed by Join Filter: 19368
Buffers: shared hit=17585790
-> Nested Loop (cost=1.80..3529.88 rows=395 width=28)
(actual time=0.048..2876.926 rows=2968729 loops=1)
Buffers: shared hit=2053968
-> Nested Loop (cost=1.11..11.73 rows=1
width=73) (actual time=0.028..0.426 rows=9 loops=1)
Join Filter: (u.sponsor_id = sp.sponsor_id)
Buffers: shared hit=67
-> Nested Loop (cost=0.70..9.09 rows=1
width=95) (actual time=0.019..0.262 rows=9 loops=1)
Buffers: shared hit=31
-> Index Only Scan using
index_user_populations_on_user_id_and_population_id on
user_populations up (cost=0.42..1.57 rows=3 width=36) (actual
time=0.010..0.047 rows=9 loops=1)
Index Cond: (user_id = 3962)
Heap Fetches: 0
Buffers: shared hit=4
-> Index Scan using
index_sponsor_populations_on_population_id_and_sponsor_id on
sponsor_populations sp (cost=0.28..2.50 rows=1 width=59) (actual
time=0.011..0.016 rows=1 loops=9)
Index Cond: (population_id =
up.population_id)
Buffers: shared hit=27
-> Index Scan using users_pkey on users u
(cost=0.41..2.63 rows=1 width=24) (actual time=0.013..0.013 rows=1
loops=9)
Index Cond: (id = 3962)
Buffers: shared hit=36
-> Index Only Scan using
index_members_on_population_id on members m (cost=0.70..3173.12
rows=34503 width=40) (actual time=0.021..288.933 rows=329859 loops=9)
Index Cond: (population_id = sp.population_id)
Heap Fetches: 2729516
Buffers: shared hit=2053901
-> Index Scan using
index_new_patient_visits_on_patient_id on new_patient_visits v
(cost=0.57..4.26 rows=1 width=24) (actual time=0.006..0.006 rows=0
loops=2968729)
Index Cond: (patient_id = m.patient_id)
Filter: ((end_on IS NULL) AND
(cached_population_ids &&
'{foo-population,bar-population,baz-population,quux-population}'::text[]))
Rows Removed by Filter: 4
Buffers: shared hit=15531822
Settings: effective_cache_size = '372GB', max_parallel_workers =
'12', max_parallel_workers_per_gather = '4', work_mem = '164MB',
random_page_cost = '1.1', effective_io_concurrency = '200'
Planning:
Buffers: shared hit=44
Planning Time: 68.692 ms
Execution Time: 21346.567 ms
(42 rows)

===
Despite the fact that there are only 29,751 patient_visits rows where
end_on is null and the array intersection is non-empty, whereas there
are (as mentioned previously) over 4M rows that meet the full authz
condition, the query planner is starting with the larger set, and I
think this is because of the misestimation of how many members rows
are involved. (BTW, the tables are all freshly analyzed.) If I
explicitly materialize the relevant visits, I get a much better plan:

db=# explain (analyze, buffers, settings) with user_member_expirations as (
select
u.id user_id,
m.id member_id,
m.patient_id,
(m.last_activity_on + sp.authz_expiration_interval) expires_on
from members m
join sponsor_populations sp on sp.population_id = m.population_id
join user_populations up on up.population_id = m.population_id
join users u on u.id = up.user_id
where sp.sponsor_id = u.sponsor_id
), visits as materialized (
select * from new_patient_visits v
where end_on is null
and v.cached_population_ids &&
ARRAY['foo-population','bar-population','baz-population','quux-population']
)
select v.patient_class, count(distinct v.id)
from visits v
join user_member_expirations u on u.patient_id = v.patient_id
where u.user_id = 3962
and v.start_on < u.expires_on
group by v.patient_class;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=58665.16..58665.18 rows=1 width=12) (actual
time=4310.702..4315.069 rows=5 loops=1)
Group Key: v.patient_class
Buffers: shared hit=2599817
CTE visits
-> Bitmap Heap Scan on new_patient_visits v_1
(cost=55.66..54698.37 rows=49407 width=722) (actual
time=14.364..72.664 rows=29751 loops=1)
Recheck Cond: ((cached_population_ids &&
'{foo-population,bar-population,baz-population,quux-population}':::text[])
AND (end_on IS NULL))
Heap Blocks: exact=28442
Buffers: shared hit=28504
-> Bitmap Index Scan on
index_new_patient_visits_on_open_cached_population_ids
(cost=0.00..43.31 rows=49407 width=0) (actual time=9.569..9.570
rows=29751 loops=1)
Index Cond: (cached_population_ids &&
'{foo-population,bar-population,baz-population,quux-population}':::text[])
Buffers: shared hit=62
-> Sort (cost=3966.79..3966.80 rows=1 width=12) (actual
time=4307.634..4309.925 rows=50988 loops=1)
Sort Key: v.patient_class, v.id
Sort Method: quicksort Memory: 3528kB
Buffers: shared hit=2599817
-> Nested Loop (cost=1.11..3966.78 rows=1 width=12) (actual
time=14.550..4280.320 rows=50988 loops=1)
Join Filter: ((m.population_id = sp.population_id) AND
(v.start_on < (m.last_activity_on + sp.authz_expiration_interval)))
Rows Removed by Join Filter: 2124510
Buffers: shared hit=2599817
-> Nested Loop (cost=1.11..1493.94 rows=558 width=97)
(actual time=14.405..250.756 rows=267759 loops=1)
Buffers: shared hit=28571
-> Nested Loop (cost=1.11..11.73 rows=1
width=73) (actual time=0.035..0.327 rows=9 loops=1)
Join Filter: (u.sponsor_id = sp.sponsor_id)
Buffers: shared hit=67
-> Nested Loop (cost=0.70..9.09 rows=1
width=95) (actual time=0.023..0.211 rows=9 loops=1)
Buffers: shared hit=31
-> Index Only Scan using
index_user_populations_on_user_id_and_population_id on
user_populations up (cost=0.42..1.57 rows=3 width=36) (actual
time=0.013..0.036 rows=9 loops=1)
Index Cond: (user_id = 3962)
Heap Fetches: 0
Buffers: shared hit=4
-> Index Scan using
index_sponsor_populations_on_population_id_and_sponsor_id on
sponsor_populations sp (cost=0.28..2.50 rows=1 width=59) (actual
time=0.014..0.016 rows=1 loops=9)
Index Cond: (population_id =
up.population_id)
Buffers: shared hit=27
-> Index Scan using users_pkey on users u
(cost=0.41..2.63 rows=1 width=24) (actual time=0.011..0.011 rows=1
loops=9)
Index Cond: (id = 3962)
Buffers: shared hit=36
-> CTE Scan on visits v (cost=0.00..988.14
rows=49407 width=24) (actual time=1.597..21.285 rows=29751 loops=9)
Buffers: shared hit=28504
-> Index Scan using index_members_on_patient_id on
members m (cost=0.00..4.38 rows=3 width=40) (actual time=0.003..0.014
rows=8 loops=267759)
Index Cond: (patient_id = v.patient_id)
Rows Removed by Index Recheck: 0
Buffers: shared hit=2571246
Settings: effective_cache_size = '372GB', max_parallel_workers =
'12', max_parallel_workers_per_gather = '4', work_mem = '164MB',
random_page_cost = '1.1', effective_io_concurrency = '200'
Planning:
Buffers: shared hit=32
Planning Time: 67.924 ms
Execution Time: 4320.252 ms
(47 rows)

===

Of course, I'd prefer not to have to materialize this relation
explicitly. This particular query, for this particular user, benefits
from it, but similar queries or queries for different users may not.

I think the root of the problem is that population size (i.e., the
number of members in a given population) has a high variance, and then
planner is basing its estimates on the average population size (and
maybe the average number of populations to which a user has access?),
which is not especially useful. Is there anything I can do about this?
Would any extended statistics be useful here?

Thanks,
Jon

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrei Lepikhov 2024-07-30 15:34:20 Re: Help with row estimate problem
Previous Message Laurenz Albe 2024-07-29 09:56:46 Re: proposal: schema variables