From: | Loïc Revest <l(dot)revest(at)apc(dot)fr> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | lag() default value ignored for some window partition depending on table records count? |
Date: | 2022-04-26 22:25:36 |
Message-ID: | CABkOrLve_F5PyQNL+BhnUCT1FRVF5QV6mfVAuhrharF_Ka+9eQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
Long story short:
- We track history data in a partitioned table which boils down to "key
fields + a JSONB data column", the latter being a list of date/numeric
values pair ;
- For every of this date/numeric couple within their "window partitioning",
we need to determine the preceding value, thus the use of lag(<value>, 1,
0::numeric), since it's relevant for us here to get the very first record
of the partition having "0.00" as its preceding value;
- Except that the "first" row of some partitions gets NULL instead of
0::numeric as the result of lag(<value>, 1, 0::numeric), while other get
0::numeric as expected;
- Things get stranger when the table containing the data get "purged" from
every record except those corresponding to the "window partition" whose
first record gets NULL as lag() output: now it also gets 0::numeric...
More details:
- Behavior first observed on 12.10 (Debian, through pgdg repo), then
reproduced on 14.2 (OpenBSD, through packages);
- "Odd" behavior may be seen for particular window partition when row count
∼ 600k, but not when only relevant rows (58 records) are kept;
- EXPLAIN ANALYZE when getting the "odd" behavior:
https://explain.depesz.com/s/4kEt
- Output (first rows) to illustrate (\pset null 'NULL'):
lag_return=> SELECT * FROM points_timeserie_full WHERE peer_id = 10043;
peer_id | date | points_count |
previous_points_count
---------+-------------------------------+--------------+-----------------------
10043 | 2019-11-13 13:59:50+01 | 770.00 |
NULL
10043 | 2019-11-14 15:17:15+01 | 480.00 |
770.00
10043 | 2019-12-17 13:23:20+01 | 770.00 |
480.00
While for another "peer_id":
peer_id | date | points_count |
previous_points_count
---------+-------------------------------+--------------+-----------------------
10015 | 2019-11-15 12:39:34+01 | 840.00 |
0.00
10015 | 2019-11-19 14:12:26+01 | 1165.00 |
840.00
10015 | 2019-11-21 15:51:52+01 | 1165.00 |
1165.00
- EXPLAIN ANALYZE when keeping relevant records only:
https://explain.depesz.com/s/pZKFL
- Output (first rows) to illustrate (\pset null 'NULL'):
lag_return=> SELECT * FROM points_timeserie_10043 WHERE peer_id = 10043;
peer_id | date | points_count |
previous_points_count
---------+-------------------------------+--------------+-----------------------
10043 | 2019-11-13 13:59:50+01 | 770.00 |
0
10043 | 2019-11-14 15:17:15+01 | 480.00 |
770.00
10043 | 2019-12-17 13:23:20+01 | 770.00 |
480.00
(Note: points_timeserie_full and points_timeserie_10043 are strictly
identical views structure-wise, the only difference being in the dataset of
their respective tables (themselves identical structure-wise), as explained
above).
Attached is the test case database script we've been able to reproduce the
"issue" one hundred percent of the time on, but without the data - even
after compression this weights ∼8.5Mb. Data that I'd happily share would
anyone points me towards the customary way to do it here with such "big" a
file.
Regards,
L. Revest
Attachment | Content-Type | Size |
---|---|---|
test_case_db.sql | application/sql | 10.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-26 22:35:54 | Re: lag() default value ignored for some window partition depending on table records count? |
Previous Message | Tom Lane | 2022-04-26 19:47:13 | Re: Fix primary crash continually with invalid checkpoint after promote |