Fwd: BUG #13908: Query returns too few rows

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Seth P <seth-p(at)outlook(dot)com>
Subject: Fwd: BUG #13908: Query returns too few rows
Date: 2016-02-04 21:58:18
Message-ID: CAKFQuwY8MGr1kKz1DzfdERnVxL-aQo=L3hHY5ytjZFj3TNAxoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Adding back the list so that everyone can see the latest finding. Adding
in the original bug block too...

The following bug has been logged on the website:

Bug reference: 13908
Logged by: Seth
Email address: seth-p(at)outlook(dot)com
PostgreSQL version: 9.5.0
Operating system: Windows 10
Description:

*​Next request, Seth*: please restore your dump into a clean database and
see what results you get. Also, please run:
SELECT version(); and provide the results.​

​My findings
​- there is​
a smaller count for
​the ​
DISTINCT than without.​


I successfully restored your pg_dump file​, vacuum analyzed it, then ran
the two counting queries. Here are my results.

The DISTINCT plan has an actual count of 415,874 while the non-DISTINCT
plan resulted in 415,967; *thus DISTINCT removed 93 duplicates.*

Looking back at your most recent email your numbers are 415,983 and
416,009; an increase of 26 by adding DISTINCT...and not matching either of
these numbers.

*version*
PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit

​Installed into the default postgres database which has an encoding of
en_US.UTF-8

(NB: had to use pg_database as I couldn't located a psql meta-command to
list databases...am I missing something here?)

The follow are the results of my EXPLAIN ANALYZE confirmed by running the
actual count queries.

QUERY PLAN
Aggregate (cost=1464939.22..1464939.23 rows=1 width=0) (actual
time=151609.418..151609.419 rows=1 loops=1)
* -> Hash Join (cost=1454885.92..1464269.54 rows=267870 width=0) (actual
time=150567.030..151188.740 rows=415967 loops=1)*
Hash Cond: ((uuu.bababa)::text = (rrr.bababa)::text)
-> Index Only Scan using pk_uuu on uuu (cost=0.42..270.01
rows=8091 width=8) (actual time=0.019..9.337 rows=7993 loops=1)
Index Cond: (universe_hash = '5188205190738336870'::bigint)
Heap Fetches: 0
-> Hash (cost=1440572.59..1440572.59 rows=872393 width=8) (actual
time=150566.397..150566.397 rows=1275138 loops=1)
Buckets: 131072 (originally 131072) Batches: 32 (originally
16) Memory Usage: 3073kB
-> Hash Join (cost=819.47..1440572.59 rows=872393 width=8)
(actual time=11865.529..149038.533 rows=1275138 loops=1)
Hash Cond: (rrr.fff_idx = fff.idx)
-> Seq Scan on rrr (cost=0.00..1164409.32
rows=71098632 width=12) (actual time=0.006..73136.366 rows=71098547 loops=1)
-> Hash (cost=807.58..807.58 rows=951 width=4)
(actual time=3.458..3.458 rows=964 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 42kB
-> Bitmap Heap Scan on fff (cost=30.98..807.58
rows=951 width=4) (actual time=0.133..1.843 rows=964 loops=1)
Recheck Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))
Filter: ((file_name_model_ver)::text =
'100'::text)
Rows Removed by Filter: 540
Heap Blocks: exact=30
-> Bitmap Index Scan on
ix_fff_file_name_date (cost=0.00..30.74 rows=1445 width=0) (actual
time=0.117..0.117 rows=1504 loops=1)
Index Cond: ((file_name_date >=
'2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))
Planning time: 1.727 ms
Execution time: 151609.494 ms

QUERY PLAN
Aggregate (cost=1504689.02..1504689.03 rows=1 width=0) (actual
time=155253.758..155253.759 rows=1 loops=1)
* -> Unique (cost=1497322.60..1501340.65 rows=267870 width=24) (actual
time=153375.583..154829.461 rows=415874 loops=1)*
-> Sort (cost=1497322.60..1497992.27 rows=267870 width=24)
(actual time=153375.578..153919.245 rows=415874 loops=1)
Sort Key: rrr.idx, rrr.row_number, rrr.bababa, rrr.data_date,
rrr.fff_idx
Sort Method: external merge Disk: 13784kB
-> Hash Join (cost=1456589.92..1467677.54 rows=267870
width=24) (actual time=151982.289..152665.099 rows=415874 loops=1)
Hash Cond: ((uuu.bababa)::text = (rrr.bababa)::text)
-> Index Only Scan using pk_uuu on uuu
(cost=0.42..270.01 rows=8091 width=8) (actual time=0.012..9.359 rows=7993
loops=1)
Index Cond: (universe_hash =
'5188205190738336870'::bigint)
Heap Fetches: 0
-> Hash (cost=1440572.59..1440572.59 rows=872393
width=24) (actual time=151981.919..151981.919 rows=1275138 loops=1)
Buckets: 65536 (originally 65536) Batches: 32
(originally 16) Memory Usage: 3585kB
-> Hash Join (cost=819.47..1440572.59
rows=872393 width=24) (actual time=12037.549..150408.756 rows=1275138
loops=1)
Hash Cond: (rrr.fff_idx = fff.idx)
-> Seq Scan on rrr (cost=0.00..1164409.32
rows=71098632 width=24) (actual time=0.004..73673.708 rows=71098547 loops=1)
-> Hash (cost=807.58..807.58 rows=951
width=4) (actual time=2.360..2.360 rows=964 loops=1)
Buckets: 1024 Batches: 1 Memory
Usage: 42kB
-> Bitmap Heap Scan on fff
(cost=30.98..807.58 rows=951 width=4) (actual time=0.123..1.293 rows=964
loops=1)
Recheck Cond: ((file_name_date
>= '2005-03-01'::date) AND (file_name_date <= '2005-07-30'::date))
Filter:
((file_name_model_ver)::text = '100'::text)
Rows Removed by Filter: 540
Heap Blocks: exact=30
-> Bitmap Index Scan on
ix_fff_file_name_date (cost=0.00..30.74 rows=1445 width=0) (actual
time=0.111..0.111 rows=1504 loops=1)
Index Cond:
((file_name_date >= '2005-03-01'::date) AND (file_name_date <=
'2005-07-30'::date))
Planning time: 0.668 ms
Execution time: 155255.933 ms

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2016-02-04 22:13:00 Re: BUG #13908: Query returns too few rows
Previous Message Andres Freund 2016-02-04 21:15:34 Re: BUG #13891: Deparsed arbiter WHERE clauses cannot be parsed by Postgres