Different results from identical matviews

From: Anders Steinlein <anders(at)e5r(dot)no>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Different results from identical matviews
Date: 2020-07-01 23:39:48
Message-ID: CAC35HNnNGavaZ=P=rUcwTwYEhfoyXDg32REXCRDgxBmC3No3nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi folks,

We have a materialized view from which a customer reported some
confusing/invalid results, leading us to inspect the query and not finding
anything wrong. Running the query defining the matview manually, or
creating a new (identical) materialized view returns the correct result.
Obviously, we've done REFRESH MATERIALIZED VIEW just before doing the
comparison, and all runs are in the same schema.

It's a pretty big query, but let's describe the two matviews to see that
they are identical. The first is the original returning invalid results,
the one with _2 name postfix is the re-created one.

mm_prod=> \d+ segments_with_contacts
Materialized view "aakpnews.segments_with_contacts"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------------+-----------+----------+---------+----------+--------------+-------------
lid | integer | | | | plain |
|
sid | integer | | | | plain |
|
email | public.citext | | | | extended |
|
Indexes:
"segments_with_contacts_sid_lid_email_idx" UNIQUE, btree (sid, lid,
email)
View definition:
WITH tagged_contacts AS (
SELECT cl.lid,
cl.email,
cl.skip_preexisting_campaigns AS skip_subscribed,
ct.skip_preexisting_campaigns AS skip_tags,
cl.ladded,
ct.tagname,
ct.created
FROM contacts_lists cl
LEFT JOIN contacts_tags ct USING (email)
WHERE cl.lstatus::bpchar = 'a'::bpchar
), tagged_segments AS (
SELECT s.lid,
s.cid,
s.sid,
sp.type,
sp.mid,
sp.matchdelay,
sp.tagname,
sp.event,
count(*) OVER (PARTITION BY s.sid) AS requirements,
campaigns.activated_at
FROM segments s
LEFT JOIN campaigns USING (cid)
JOIN segments_predicates sp USING (sid)
WHERE s.archived_at IS NULL AND (s.cid IS NULL OR
campaigns.activated_at IS NOT NULL)
), segments_contacts AS (
SELECT s.lid,
s.sid,
s.requirements,
CASE
WHEN s.type = 'subscribed'::public.predicate THEN (
SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid AND
tagged_contacts.ladded >= s.activated_at AND (s.matchdelay IS NULL OR
(tagged_contacts.ladded + s.matchdelay) < now()) AND NOT
COALESCE(tagged_contacts.skip_subscribed, false))
WHEN s.type = 'has_tag'::public.predicate THEN ( SELECT
array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid AND
tagged_contacts.tagname OPERATOR(public.=) s.tagname AND (s.matchdelay IS
NULL OR (tagged_contacts.created + s.matchdelay) < now()) AND (s.cid IS
NULL OR tagged_contacts.created >= s.activated_at AND NOT
COALESCE(tagged_contacts.skip_tags, false)))
WHEN s.type = 'not_has_tag'::public.predicate THEN (
SELECT array_agg(aggregated_tags.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email,
array_agg(tagged_contacts.tagname) AS tags
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid AND (s.cid
IS NULL OR tagged_contacts.ladded >= s.activated_at AND NOT
COALESCE(tagged_contacts.skip_subscribed, false))
GROUP BY tagged_contacts.email)
aggregated_tags
WHERE NOT aggregated_tags.tags @> ARRAY[s.tagname])
WHEN s.type = 'received'::public.predicate THEN (
SELECT array_agg(DISTINCT mails_contacts_sent.email::public.citext) AS
array_agg
FROM mails_contacts_sent
WHERE mails_contacts_sent.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) <
now()))
WHEN s.type = 'not_received'::public.predicate THEN (
SELECT array_agg(x.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid
EXCEPT
SELECT DISTINCT mails_contacts_sent.email
FROM mails_contacts_sent
WHERE mails_contacts_sent.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) <
now())) x)
WHEN s.type = 'opened'::public.predicate THEN ( SELECT
array_agg(DISTINCT mails_contacts_opens.email::public.citext) AS array_agg
FROM mails_contacts_opens
WHERE mails_contacts_opens.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_opens.opentime + s.matchdelay) <
now()))
WHEN s.type = 'not_opened'::public.predicate THEN (
SELECT array_agg(x.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email
FROM tagged_contacts
JOIN mails_contacts_sent USING (email)
WHERE tagged_contacts.lid = s.lid AND
mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR
(mails_contacts_sent.senttime + s.matchdelay) < now())
EXCEPT
SELECT DISTINCT mails_contacts_opens.email
FROM mails_contacts_opens
WHERE mails_contacts_opens.mid = s.mid) x)
WHEN s.type = 'clicked'::public.predicate THEN ( SELECT
array_agg(DISTINCT mails_contacts_clicks.email::public.citext) AS array_agg
FROM mails_contacts_clicks
WHERE mails_contacts_clicks.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_clicks.clicktime + s.matchdelay) <
now()))
WHEN s.type = 'not_clicked'::public.predicate THEN (
SELECT array_agg(x.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email
FROM tagged_contacts
JOIN mails_contacts_sent USING (email)
WHERE tagged_contacts.lid = s.lid AND
mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR
(mails_contacts_sent.senttime + s.matchdelay) < now())
EXCEPT
SELECT DISTINCT mails_contacts_clicks.email
FROM mails_contacts_clicks
WHERE mails_contacts_clicks.mid = s.mid) x)
WHEN s.type = 'event_triggered_first'::public.predicate
THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
FROM ( SELECT e.email
FROM events e
JOIN tagged_contacts USING (email)
WHERE tagged_contacts.lid = s.lid AND e.event
OPERATOR(public.=) s.event
GROUP BY e.email
HAVING (s.matchdelay IS NULL OR
(min(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR
min(e.triggered_at) >= s.activated_at)) x)
WHEN s.type = 'event_triggered_last'::public.predicate
THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
FROM ( SELECT e.email
FROM events e
JOIN tagged_contacts USING (email)
WHERE tagged_contacts.lid = s.lid AND e.event
OPERATOR(public.=) s.event
GROUP BY e.email
HAVING (s.matchdelay IS NULL OR
(max(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR
max(e.triggered_at) >= s.activated_at)) x)
ELSE NULL::public.citext[]
END AS emails
FROM tagged_segments s
), unnested AS (
SELECT segments_contacts.lid,
segments_contacts.sid,
segments_contacts.requirements,
unnest(segments_contacts.emails) AS email
FROM segments_contacts
)
SELECT unnested.lid,
unnested.sid,
unnested.email
FROM unnested
GROUP BY unnested.lid, unnested.sid, unnested.email, unnested.requirements
HAVING count(unnested.email) = unnested.requirements;
Access method: heap

mm_prod=> \d+ segments_with_contacts_2
Materialized view "aakpnews.segments_with_contacts_2"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------------+-----------+----------+---------+----------+--------------+-------------
lid | integer | | | | plain |
|
sid | integer | | | | plain |
|
email | public.citext | | | | extended |
|
View definition:
WITH tagged_contacts AS (
SELECT cl.lid,
cl.email,
cl.skip_preexisting_campaigns AS skip_subscribed,
ct.skip_preexisting_campaigns AS skip_tags,
cl.ladded,
ct.tagname,
ct.created
FROM contacts_lists cl
LEFT JOIN contacts_tags ct USING (email)
WHERE cl.lstatus::bpchar = 'a'::bpchar
), tagged_segments AS (
SELECT s.lid,
s.cid,
s.sid,
sp.type,
sp.mid,
sp.matchdelay,
sp.tagname,
sp.event,
count(*) OVER (PARTITION BY s.sid) AS requirements,
campaigns.activated_at
FROM segments s
LEFT JOIN campaigns USING (cid)
JOIN segments_predicates sp USING (sid)
WHERE s.archived_at IS NULL AND (s.cid IS NULL OR
campaigns.activated_at IS NOT NULL)
), segments_contacts AS (
SELECT s.lid,
s.sid,
s.requirements,
CASE
WHEN s.type = 'subscribed'::public.predicate THEN (
SELECT array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid AND
tagged_contacts.ladded >= s.activated_at AND (s.matchdelay IS NULL OR
(tagged_contacts.ladded + s.matchdelay) < now()) AND NOT
COALESCE(tagged_contacts.skip_subscribed, false))
WHEN s.type = 'has_tag'::public.predicate THEN ( SELECT
array_agg(DISTINCT tagged_contacts.email::public.citext) AS array_agg
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid AND
tagged_contacts.tagname OPERATOR(public.=) s.tagname AND (s.matchdelay IS
NULL OR (tagged_contacts.created + s.matchdelay) < now()) AND (s.cid IS
NULL OR tagged_contacts.created >= s.activated_at AND NOT
COALESCE(tagged_contacts.skip_tags, false)))
WHEN s.type = 'not_has_tag'::public.predicate THEN (
SELECT array_agg(aggregated_tags.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email,
array_agg(tagged_contacts.tagname) AS tags
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid AND (s.cid
IS NULL OR tagged_contacts.ladded >= s.activated_at AND NOT
COALESCE(tagged_contacts.skip_subscribed, false))
GROUP BY tagged_contacts.email)
aggregated_tags
WHERE NOT aggregated_tags.tags @> ARRAY[s.tagname])
WHEN s.type = 'received'::public.predicate THEN (
SELECT array_agg(DISTINCT mails_contacts_sent.email::public.citext) AS
array_agg
FROM mails_contacts_sent
WHERE mails_contacts_sent.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) <
now()))
WHEN s.type = 'not_received'::public.predicate THEN (
SELECT array_agg(x.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email
FROM tagged_contacts
WHERE tagged_contacts.lid = s.lid
EXCEPT
SELECT DISTINCT mails_contacts_sent.email
FROM mails_contacts_sent
WHERE mails_contacts_sent.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_sent.senttime + s.matchdelay) <
now())) x)
WHEN s.type = 'opened'::public.predicate THEN ( SELECT
array_agg(DISTINCT mails_contacts_opens.email::public.citext) AS array_agg
FROM mails_contacts_opens
WHERE mails_contacts_opens.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_opens.opentime + s.matchdelay) <
now()))
WHEN s.type = 'not_opened'::public.predicate THEN (
SELECT array_agg(x.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email
FROM tagged_contacts
JOIN mails_contacts_sent USING (email)
WHERE tagged_contacts.lid = s.lid AND
mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR
(mails_contacts_sent.senttime + s.matchdelay) < now())
EXCEPT
SELECT DISTINCT mails_contacts_opens.email
FROM mails_contacts_opens
WHERE mails_contacts_opens.mid = s.mid) x)
WHEN s.type = 'clicked'::public.predicate THEN ( SELECT
array_agg(DISTINCT mails_contacts_clicks.email::public.citext) AS array_agg
FROM mails_contacts_clicks
WHERE mails_contacts_clicks.mid = s.mid AND
(s.matchdelay IS NULL OR (mails_contacts_clicks.clicktime + s.matchdelay) <
now()))
WHEN s.type = 'not_clicked'::public.predicate THEN (
SELECT array_agg(x.email::public.citext) AS array_agg
FROM ( SELECT tagged_contacts.email
FROM tagged_contacts
JOIN mails_contacts_sent USING (email)
WHERE tagged_contacts.lid = s.lid AND
mails_contacts_sent.mid = s.mid AND (s.matchdelay IS NULL OR
(mails_contacts_sent.senttime + s.matchdelay) < now())
EXCEPT
SELECT DISTINCT mails_contacts_clicks.email
FROM mails_contacts_clicks
WHERE mails_contacts_clicks.mid = s.mid) x)
WHEN s.type = 'event_triggered_first'::public.predicate
THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
FROM ( SELECT e.email
FROM events e
JOIN tagged_contacts USING (email)
WHERE tagged_contacts.lid = s.lid AND e.event
OPERATOR(public.=) s.event
GROUP BY e.email
HAVING (s.matchdelay IS NULL OR
(min(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR
min(e.triggered_at) >= s.activated_at)) x)
WHEN s.type = 'event_triggered_last'::public.predicate
THEN ( SELECT array_agg(DISTINCT x.email::public.citext) AS array_agg
FROM ( SELECT e.email
FROM events e
JOIN tagged_contacts USING (email)
WHERE tagged_contacts.lid = s.lid AND e.event
OPERATOR(public.=) s.event
GROUP BY e.email
HAVING (s.matchdelay IS NULL OR
(max(e.triggered_at) + s.matchdelay) < now()) AND (s.cid IS NULL OR
max(e.triggered_at) >= s.activated_at)) x)
ELSE NULL::public.citext[]
END AS emails
FROM tagged_segments s
), unnested AS (
SELECT segments_contacts.lid,
segments_contacts.sid,
segments_contacts.requirements,
unnest(segments_contacts.emails) AS email
FROM segments_contacts
)
SELECT unnested.lid,
unnested.sid,
unnested.email
FROM unnested
GROUP BY unnested.lid, unnested.sid, unnested.email, unnested.requirements
HAVING count(unnested.email) = unnested.requirements;
Access method: heap

Here you can see the results are different:

mm_prod=> begin;
BEGIN
mm_prod=> refresh materialized view segments_with_contacts;
REFRESH MATERIALIZED VIEW
mm_prod=> select count(*) from segments_with_contacts where sid = 42259;
count
-------
91
(1 row)

mm_prod=> refresh materialized view segments_with_contacts_2;
REFRESH MATERIALIZED VIEW
mm_prod=> select count(*) from segments_with_contacts_2 where sid = 42259;
count
-------
98
(1 row)

How can we figure out what's wrong here? Since it's a materialized view
using EXPLAIN doesn't give me much to go on:

mm_prod=> explain select count(*) from segments_with_contacts where sid =
42259;
QUERY PLAN

------------------------------------------------------------------------------
Aggregate (cost=83.52..83.53 rows=1 width=8)
-> Seq Scan on segments_with_contacts (cost=0.00..83.29 rows=91
width=0)
Filter: (sid = 42259)
(3 rows)

mm_prod=> explain select count(*) from segments_with_contacts_2 where sid =
42259;
QUERY PLAN

---------------------------------------------------------------------------------
Aggregate (cost=87.46..87.47 rows=1 width=8)
-> Seq Scan on segments_with_contacts_2 (cost=0.00..87.20 rows=105
width=0)
Filter: (sid = 42259)
(3 rows)

mm_prod=> select count(*) from segments_with_contacts_2 where sid = 42259;
count
-------
98
(1 row)

Any insights into this, and how to debug this further, is much appreciated.

Best,
-- a.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-07-01 23:59:45 Re: PostgreSQL 12 - ERROR: could not rename file "pg_logical/snapshots/1A-7C00D890.snap.13594.tmp" to "pg_logical/snapshots/1A-7C00D890.snap": No space left on device
Previous Message FOUTE K. Jaurès 2020-07-01 21:21:23 Re: PostgreSQL 12 - ERROR: could not rename file "pg_logical/snapshots/1A-7C00D890.snap.13594.tmp" to "pg_logical/snapshots/1A-7C00D890.snap": No space left on device