Re: BUG #13908: Query returns too few rows

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Seth P <seth-p(at)outlook(dot)com>
Subject: Re: BUG #13908: Query returns too few rows
Date: 2016-02-05 04:47:32
Message-ID: CAKFQuwaj9Q+UEEtLmyA032=CMKUFAhtWU4tuQ4dY3wUJ_5JDuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Feb 4, 2016 at 8:57 PM, David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Thu, Feb 4, 2016 at 4:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
>> > *The second-level Hash Join combines/sees, in both cases, 7,993 and
>> > 1,275,138 records but depending on the LEFT/RIGHT order of the sub-nodes
>> > appears to provide a different result.*
>>
>> Hmm, that's pretty fishy; and given all the hacking that's gone on lately
>> on hash join, 9.5 introducing a bug there wouldn't be all that
>> astonishing.
>> But I decline to try to find it without a self-contained test case.
>>
>>
> ​So, one of the tables involved has 70M-ish rows of which only maybe
> 2M-ish of them are necessary to fulfill the query in terms of getting the
> right answer. Realizing this I deleted the other 68M records and then
> immediately (same transaction) run the counting query and find that I still
> get an incorrect result. I ran a count instead of an explain analyze so
> I'm not sure exactly what it did behind the scenes. Regardless, I got the
> under-counting behavior. I then commit the transaction, run vacuum
> analyze, and the re-run the counting query and that completes I now get the
> correct count.
>
> Do you expect that some portion of the 68M records that were deleted would
> be required to diagnose the problem or can knobs be twiddled in some way to
> get similar behavior exhibited without them actually being present. The
> fact that the records were deleted, but not vacuumed, and a wrong result
> was returned makes me think it should be possible but I don't know enough
> to say for sure or to make informed decisions as to which knobs to turn (if
> indeed such knobs are exposed since I do not have source-editing or
> debugging capabilities).
>
>
​Some more observations...

On this node's detail (the rows seems to appear consistently regardless of
node ordering in the plan...)
Hash (cost=1440572.59..1440572.59 rows=872393 width=24) (actual
time=151981.919..151981.919 rows=*1275138* loops=1)
Bad Plan: ​

​ ​
Buckets: 131072 (originally 131072) Batches: 32 (originally 16) Memory
Usage: 3073kB
​Bad Plan: ​ Buckets: 65536 (originally 65536) Batches: 32 (originally
16) Memory Usage: 3585kB
Good Plan: Buckets: 1024 Batches: 1 Memory Usage: 42kB
Good Plan: Buckets: 8192 Batches: 1 Memory Usage: 439kB (see below)
9.3 Plan: Buckets: 4096 Batches: 64 Memory Usage: 889kB

9.5 Made Up Good Plan (see below): Buckets: 131072 (originally 1024)
Batches: 16 (originally 1) Memory Usage: 3227kB

*The difference on this last one is that the original bucket count is
considerably smaller than the seemingly important 2^16-1 boundary even
though the final count is greater.*

I was able to force a Hash Join between the 1,275,138 node and the 7,993
node using CTEs
BEGIN;
SET LOCAL enable_mergejoin = off;
EXPLAIN ANALYZE
WITH first_join AS (
SELECT *
FROM rrr JOIN fff ON fff.idx = rrr.fff_idx
WHERE fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
),
second_join AS (
SELECT * FROM uuu WHERE universe_hash = 5188205190738336870
),
combined AS (
SELECT * FROM first_join JOIN second_join USING (bababa)
)
SELECT count(*) FROM combined;

The result was the following explain analyze and a good count.

QUERY PLAN
Aggregate (cost=320276.69..320276.70 rows=1 width=0) (actual
time=23126.200..23126.201 rows=1 loops=1)
CTE first_join
-> Hash Join (cost=820.88..228772.40 rows=38783 width=68) (actual
time=8573.804..15931.271 rows=1275138 loops=1)
Hash Cond: (rrr.fff_idx = fff.idx)
-> Seq Scan on rrr (cost=0.00..216302.86 rows=3002886 width=24)
(actual time=7499.001..10979.666 rows=3007454 loops=1)
-> Hash (cost=808.37..808.37 rows=1001 width=44) (actual
time=2.616..2.616 rows=964 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 87kB
-> Bitmap Heap Scan on fff (cost=35.75..808.37 rows=1001
width=44) (actual time=0.122..1.438 rows=964 loops=1)
Recheck Cond: ((file_name_date >= '2005-03-01'::date)
AND (file_name_date <= '2005-07-30'::date))
Filter: ((file_name_model_ver)::text = '100'::text)
Rows Removed by Filter: 540
Heap Blocks: exact=30
-> Bitmap Index Scan on ix_fff_file_name_date
(cost=0.00..35.50 rows=1521 width=0) (actual time=0.100..0.100 rows=1504
loops=1)
Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))
CTE second_join
-> Index Only Scan using pk_uuu on uuu (cost=0.42..269.84 rows=8081
width=16) (actual time=0.019..9.372 rows=7993 loops=1)
Index Cond: (universe_hash = '5188205190738336870'::bigint)
Heap Fetches: 0
CTE combined
-> Hash Join (cost=262.63..55976.35 rows=1567027 width=228) (actual
time=8612.493..21535.488 rows=416075 loops=1)
Hash Cond: ((first_join.bababa)::text =
(second_join.bababa)::text)
-> CTE Scan on first_join (cost=0.00..775.66 rows=38783
width=220) (actual time=8573.810..19343.262 rows=1275138 loops=1)
-> Hash (cost=161.62..161.62 rows=8081 width=42) (actual
time=37.346..37.346 rows=7993 loops=1)
* Buckets: 8192 Batches: 1 Memory Usage: 439kB*
-> CTE Scan on second_join (cost=0.00..161.62 rows=8081
width=42) (actual time=0.023..27.642 rows=7993 loops=1)
-> CTE Scan on combined (cost=0.00..31340.54 rows=1567027 width=0)
(actual time=8612.498..22664.354 rows=416075 loops=1)
Planning time: 0.137 ms
Execution time: 23145.303 ms

And below is the 9.3 non-Distinct plan

QUERY PLAN
Aggregate (cost=1467123.37..1467123.38 rows=1 width=0) (actual
time=165954.407..165954.409 rows=1 loops=1)
-> Hash Join (cost=1456877.26..1466434.87 rows=275399 width=0) (actual
time=164885.552..165504.698 rows=416075 loops=1)
Hash Cond: ((uuu.bababa)::text = (rrr.bababa)::text)
-> Index Only Scan using pk_uuu on uuu (cost=0.42..260.48
rows=7775 width=8) (actual time=0.011..9.630 rows=7993 loops=1)
Index Cond: (universe_hash = 5188205190738336870::bigint)
Heap Fetches: 0
-> Hash (cost=1441329.64..1441329.64 rows=947616 width=8) (actual
time=164884.922..164884.922 rows=1275138 loops=1)
*Buckets: 4096 Batches: 64 Memory Usage: 889kB*
-> Hash Join (cost=822.75..1441329.64 rows=947616 width=8)
(actual time=60770.116..163253.376 rows=1275138 loops=1)
Hash Cond: (rrr.fff_idx = fff.idx)
-> Seq Scan on rrr (cost=0.00..1164410.44
rows=71098744 width=12) (actual time=0.036..79440.276 rows=71098547 loops=1)
-> Hash (cost=809.84..809.84 rows=1033 width=4)
(actual time=2.577..2.577 rows=964 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
-> Bitmap Heap Scan on fff (cost=36.29..809.84
rows=1033 width=4) (actual time=0.110..1.418 rows=964 loops=1)
Recheck Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))
Filter: ((file_name_model_ver)::text =
'100'::text)
Rows Removed by Filter: 540
-> Bitmap Index Scan on
ix_fff_file_name_date (cost=0.00..36.03 rows=1574 width=0) (actual
time=0.098..0.098 rows=1504 loops=1)
Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))
Total runtime: 165954.463 ms

Though it doesn't seem simply a function bug bucket count since:

EXPLAIN ANALYZE
WITH first_join AS (
SELECT * FROM generate_series(1, 1000000) gs (i)
),
second_join AS (
SELECT * FROM generate_series(1, 1000000) gs (i)
),
combined AS (
SELECT * FROM first_join JOIN second_join USING (i)
)
SELECT count(*) FROM combined;

QUERY PLAN
Aggregate (cost=362.50..362.51 rows=1 width=0) (actual
time=13842.652..13842.653 rows=1 loops=1)
CTE first_join
-> Function Scan on generate_series gs (cost=0.00..10.00 rows=1000
width=4) (actual time=88.464..1186.923 rows=1000000 loops=1)
CTE second_join
-> Function Scan on generate_series gs_1 (cost=0.00..10.00 rows=1000
width=4) (actual time=88.287..1187.215 rows=1000000 loops=1)
CTE combined
-> Hash Join (cost=32.50..230.00 rows=5000 width=4) (actual
time=4668.961..10530.299 *rows=1000000* loops=1)
Hash Cond: (first_join.i = second_join.i)
-> CTE Scan on first_join (cost=0.00..20.00 rows=1000 width=4)
(actual time=88.468..3333.798 rows=1000000 loops=1)
-> Hash (cost=20.00..20.00 rows=1000 width=4) (actual
time=4580.247..4580.247 rows=1000000 loops=1)
* Buckets: 131072 (originally 1024) Batches: 16 (originally
1) Memory Usage: 3227kB*
-> CTE Scan on second_join (cost=0.00..20.00 rows=1000
width=4) (actual time=88.294..3320.363 rows=1000000 loops=1)
-> CTE Scan on combined (cost=0.00..100.00 rows=5000 width=0) (actual
time=4668.966..12787.652 rows=1000000 loops=1)
Planning time: 0.071 ms
Execution time: 13853.714 ms

Though maybe because the original was only 1024 while the Bad plans above
had original sizes greater than 2^16-1...

David J.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2016-02-05 05:17:17 Re: BUG #13908: Query returns too few rows
Previous Message David G. Johnston 2016-02-05 03:57:04 Re: BUG #13908: Query returns too few rows