BUG #13908: Query returns too few rows

From: seth-p(at)outlook(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #13908: Query returns too few rows
Date: 2016-02-02 19:47:29
Message-ID: 20160202194729.2776.92152@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 13908
Logged by: Seth
Email address: seth-p(at)outlook(dot)com
PostgreSQL version: 9.5.0
Operating system: Windows 10
Description:

Below is a copy of a long-ish email I wrote detailing the problem. If it
doesn't come through in a reasonable format, please let me know where I can
email it. Thanks --Seth

I've only recently started using Postgresql, and have encountered what I
believe to be a bug in Postgresql 9.5 on Windows 10.

Below are 6 queries that are identical except for (a) the date range in the
WHERE clause, and (b) whether or not they include DISTINCT. This is as
simple as I could get the queries while still reproducing the bug. The
following table summarizes the number of rows returned by each query:

-----------------------------------------------------------------------------------------------------------------------------------
-- Date Range | # rows without DISTINCT | # rows with DISTINCT
| Notes |
------------------------------------------------------------------------------|----------------------------------------------------
-- 2005-03-01 - 2005-07-30 | (A) 415,983 | (A-D)
416,075 | DISTINCT *increases* the number of rows returned! |
-- 2005-03-01 - 2005-04-30 | (B) 168,886 | (B-D)
168,886 | DISTINCT has no effect |
-- 2005-05-01 - 2005-07-30 | (C) 247,189 | (C-D)
247,189 | DISTINCT has no effect |
-- sum of sub-range queries | (B) + (C) 416,075 | (B-D) + (C-D)
416,075 | In both cases, sum of 2 queries equals (A-D) |
-----------------------------------------------------------------------------------------------------------------------------------

Query (A-D) (with DISTINCT) should not return more rows than query (A) (the
identical query without DISTINCT), so clearly something is wrong there.

Looking at the results of the queries over the two sub-ranges strongly
suggests that it is query (A) that is returning too few rows.

Beneath the queries below I also show the EXPLAIN results for queries (A),
(A-D), (B), and (B-D). Notice that the plan for query (A) differs from the
other three -- it appears to have an extra "Hash" (highlighted with
asterisks) that the others do not.

I'm afraid I am unable to share the data, but can provide table statistics,
index descriptinos, or anything else that might be useful. Any suggestions
for how to proceed?

Seth

-- (A) 2005-03-01 - 2005-07-30 returns 415,983 rows
select count(*) from
(
SELECT
rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.barrid = rrr.barrid
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
) a

-- (A-D) 2005-03-01 - 2005-07-30 DISTNICT returns 416,075 rows
select count(*) from
(
SELECT DISTINCT
rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.barrid = rrr.barrid
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
) ad

-- (B) 2005-03-01 - 2005-04-30 returns 168,886 rows
select count(*) from
(
SELECT
rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.barrid = rrr.barrid
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-04-30'
) b

-- (B-D) 2005-03-01 - 2005-04-30 DISTINCT returns 168,886 rows
select count(*) from
(
SELECT DISTINCT
rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.barrid = rrr.barrid
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-04-30'
) bd

-- (C) 2005-05-01 - 2005-07-30 returns 247,189 rows
select count(*) from
(
SELECT
rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.barrid = rrr.barrid
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-05-01' AND '2005-07-30'
) c

-- (C-D) 2005-05-01 - 2005-07-30 DISTINCT returns 247,189 rows
select count(*) from
(
SELECT DISTINCT
rrr.*
FROM files.models_direct_row_asset_data rrr
INNER JOIN files.models_direct_file fff
ON fff.idx = rrr.barra_file_idx
INNER JOIN files.temp_universe_instruments uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.barrid = rrr.barrid
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-05-01' AND '2005-07-30'
) cd

EPXLAIN of queries

(A)
"Aggregate (cost=1821781.18..1821781.19 rows=1 width=0)"
" -> Hash Join (cost=1811365.12..1821096.53 rows=273861 width=0)"
" Hash Cond: ((uuu.barrid)::text = (rrr.barrid)::text)"
" -> Bitmap Heap Scan on temp_universe_instruments uuu
(cost=231.25..932.60 rows=8108 width=8)"
" Recheck Cond: (universe_hash =
'5188205190738336870'::bigint)"
" -> Bitmap Index Scan on pk_temp_universe_instruments
(cost=0.00..229.23 rows=8108 width=0)"
" Index Cond: (universe_hash =
'5188205190738336870'::bigint)"
" -> Hash (cost=1796474.21..1796474.21 rows=893492 width=8)"
**********************************************************
" -> Hash Join (cost=2341.15..1796474.21 rows=893492
width=8)"
" Hash Cond: (rrr.barra_file_idx = fff.idx)"
" -> Seq Scan on models_direct_row_asset_data rrr
(cost=0.00..1518763.74 rows=71049174 width=12)"
" -> Hash (cost=2329.00..2329.00 rows=972 width=4)"
" -> Bitmap Heap Scan on models_direct_file fff
(cost=35.33..2329.00 rows=972 width=4)"
" Recheck Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))"
" Filter: ((file_name_model_ver)::text =
'100'::text)"
" -> Bitmap Index Scan on
ix_models_direct_file_file_name_date (cost=0.00..35.08 rows=1479 width=0)"
" Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))"

(A-D)
"Aggregate (cost=1886439.23..1886439.24 rows=1 width=0)"
" -> Unique (cost=1875484.79..1883015.97 rows=273861 width=64)"
" -> Sort (cost=1875484.79..1876169.44 rows=273861 width=64)"
" Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct,
rrr.total_risk_pct, rrr.spec_risk_pct, rrr.hist_beta, rrr.pred_beta,
rrr.data_date, rrr.barra_file_idx"
" -> Hash Join (cost=3375.10..1840453.92 rows=273861
width=64)"
" Hash Cond: ((rrr.barrid)::text = (uuu.barrid)::text)"
" -> Hash Join (cost=2341.15..1796474.21 rows=893492
width=64)"
" Hash Cond: (rrr.barra_file_idx = fff.idx)"
" -> Seq Scan on models_direct_row_asset_data rrr
(cost=0.00..1518763.74 rows=71049174 width=64)"
" -> Hash (cost=2329.00..2329.00 rows=972
width=4)"
" -> Bitmap Heap Scan on models_direct_file
fff (cost=35.33..2329.00 rows=972 width=4)"
" Recheck Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))"
" Filter: ((file_name_model_ver)::text
= '100'::text)"
" -> Bitmap Index Scan on
ix_models_direct_file_file_name_date (cost=0.00..35.08 rows=1479 width=0)"
" Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))"
" -> Hash (cost=932.60..932.60 rows=8108 width=8)"
" -> Bitmap Heap Scan on temp_universe_instruments
uuu (cost=231.25..932.60 rows=8108 width=8)"
" Recheck Cond: (universe_hash =
'5188205190738336870'::bigint)"
" -> Bitmap Index Scan on
pk_temp_universe_instruments (cost=0.00..229.23 rows=8108 width=0)"
" Index Cond: (universe_hash =
'5188205190738336870'::bigint)"

(B)
"Aggregate (cost=1809368.84..1809368.85 rows=1 width=0)"
" -> Hash Join (cost=2484.83..1809086.39 rows=112981 width=0)"
" Hash Cond: ((rrr.barrid)::text = (uuu.barrid)::text)"
" -> Hash Join (cost=1450.88..1790335.13 rows=368611 width=8)"
" Hash Cond: (rrr.barra_file_idx = fff.idx)"
" -> Seq Scan on models_direct_row_asset_data rrr
(cost=0.00..1518763.74 rows=71049174 width=12)"
" -> Hash (cost=1445.87..1445.87 rows=401 width=4)"
" -> Bitmap Heap Scan on models_direct_file fff
(cost=14.49..1445.87 rows=401 width=4)"
" Recheck Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-04-30'::date))"
" Filter: ((file_name_model_ver)::text =
'100'::text)"
" -> Bitmap Index Scan on
ix_models_direct_file_file_name_date (cost=0.00..14.39 rows=610 width=0)"
" Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-04-30'::date))"
" -> Hash (cost=932.60..932.60 rows=8108 width=8)"
" -> Bitmap Heap Scan on temp_universe_instruments uuu
(cost=231.25..932.60 rows=8108 width=8)"
" Recheck Cond: (universe_hash =
'5188205190738336870'::bigint)"
" -> Bitmap Index Scan on pk_temp_universe_instruments
(cost=0.00..229.23 rows=8108 width=0)"
" Index Cond: (universe_hash =
'5188205190738336870'::bigint)"

(B-D)
"Aggregate (cost=1827336.97..1827336.98 rows=1 width=0)"
" -> Unique (cost=1822817.73..1825924.71 rows=112981 width=64)"
" -> Sort (cost=1822817.73..1823100.18 rows=112981 width=64)"
" Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct,
rrr.total_risk_pct, rrr.spec_risk_pct, rrr.hist_beta, rrr.pred_beta,
rrr.data_date, rrr.barra_file_idx"
" -> Hash Join (cost=2484.83..1809086.39 rows=112981
width=64)"
" Hash Cond: ((rrr.barrid)::text = (uuu.barrid)::text)"
" -> Hash Join (cost=1450.88..1790335.13 rows=368611
width=64)"
" Hash Cond: (rrr.barra_file_idx = fff.idx)"
" -> Seq Scan on models_direct_row_asset_data rrr
(cost=0.00..1518763.74 rows=71049174 width=64)"
" -> Hash (cost=1445.87..1445.87 rows=401
width=4)"
" -> Bitmap Heap Scan on models_direct_file
fff (cost=14.49..1445.87 rows=401 width=4)"
" Recheck Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-04-30'::date))"
" Filter: ((file_name_model_ver)::text
= '100'::text)"
" -> Bitmap Index Scan on
ix_models_direct_file_file_name_date (cost=0.00..14.39 rows=610 width=0)"
" Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-04-30'::date))"
" -> Hash (cost=932.60..932.60 rows=8108 width=8)"
" -> Bitmap Heap Scan on temp_universe_instruments
uuu (cost=231.25..932.60 rows=8108 width=8)"
" Recheck Cond: (universe_hash =
'5188205190738336870'::bigint)"
" -> Bitmap Index Scan on
pk_temp_universe_instruments (cost=0.00..229.23 rows=8108 width=0)"
" Index Cond: (universe_hash =
'5188205190738336870'::bigint)"

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-02-02 21:12:54 Re: BUG #13888: pg_dump write error
Previous Message Tom Lane 2016-02-02 16:18:38 Re: BUG #13905: Inconsistent code modification