RE: Inefficient full seq scan on pg_largeobject instead of index scan

From: Jean-Marc Lessard <Jean-Marc(dot)Lessard(at)ultra-ft(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: RE: Inefficient full seq scan on pg_largeobject instead of index scan
Date: 2018-01-23 14:08:17
Message-ID: 8FC5F25FF3EC4744ADFCF20CBA3F44BE01246C503E@SRV-CAMTL-EXCH2.Forensictech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I have to provide a summary of how much spaces is used in the large objects table based on a group by condition.
I would expect an index only scan on the large object table, but a full seq scan that last for hours is performed.

BigSql distribution
PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
Win Server 2012 R2, 8GB RAM
pg server mem settings:
effective_cache_size | 6GB
maintenance_work_mem | 819MB
random_page_cost | 2
shared_buffers | 2GB
work_mem | 32MB

Testcase 1: Here is a simplified query, timing and the explain plan:
SELECT ima.sit_cod, COUNT(*)*2048*4/3
FROM images ima JOIN pg_largeobject ON (loid=ima.val)
GROUP BY ima.sit_cod;
Time: 343997.661 ms (about 6 min) ran on a small DB, took 4hrs on a ~1TB table

HashAggregate (cost=2452378.86..2452379.01 rows=15 width=14)
Group Key: ima.sit_cod
-> Hash Join (cost=1460.40..2418245.74 rows=6826625 width=6)
Hash Cond: (pg_largeobject.loid = ima.val)
---------> Seq Scan on pg_largeobject (cost=0.00..2322919.25 rows=6826625 width=4)
-> Hash (cost=1114.62..1114.62 rows=27662 width=10)
-> Seq Scan on images ima (cost=0.00..1114.62 rows=27662 width=10)

Testcase 2: A simple count(*) for a specific group (small group) perform an Index Only Scan and last few secs.
SELECT COUNT(*)
FROM images ima JOIN pg_largeobject ON (loid=ima.val)
WHERE sit_cod='W8213';
count
-------
8599
Time: 12.090 ms

Aggregate (cost=11930.30..11930.31 rows=1 width=8)
-> Nested Loop (cost=2.87..11918.58 rows=4689 width=0)
-> Bitmap Heap Scan on images ima (cost=2.43..37.81 rows=19 width=4)
Recheck Cond: ((sit_cod)::text = 'W8213'::text)
-> Bitmap Index Scan on ima_pk (cost=0.00..2.43 rows=19 width=0)
Index Cond: ((sit_cod)::text = 'W8213'::text)
---------> Index Only Scan using pg_largeobject_loid_pn_index on pg_largeobject (cost=0.43..621.22 rows=408 width=4)
Index Cond: (loid = ima.val)

Testcase 3: However, larger group still perform full seq scan
SELECT COUNT(*)
FROM images ima JOIN pg_largeobject ON (loid=ima.val)
WHERE sit_cod='W8317';
count
---------
2209704
Time: 345638.118 ms (about 6 min)

Aggregate (cost=2369363.01..2369363.02 rows=1 width=8)
-> Hash Join (cost=1125.63..2365419.35 rows=1577463 width=0)
Hash Cond: (pg_largeobject.loid = ima.val)
---------> Seq Scan on pg_largeobject (cost=0.00..2322919.25 rows=6826625 width=4)
-> Hash (cost=1045.73..1045.73 rows=6392 width=4)
-> Bitmap Heap Scan on images ima (cost=127.83..1045.73 rows=6392 width=4)
Recheck Cond: ((sit_cod)::text = 'W8317'::text)
-> Bitmap Index Scan on ima_pk (cost=0.00..126.23 rows=6392 width=0)
Index Cond: ((sit_cod)::text = 'W8317'::text)

Pretty sure that using the index would lead to much better perf.
Any idea of what can be done?

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com<http://www.ultra-forensictechnology.com>

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com<http://www.ultra-forensictechnology.com>

Browse pgsql-performance by date

  From Date Subject
Next Message Andreas Kretschmer 2018-01-23 14:17:27 Re: 8.2 Autovacuum BUG ?
Previous Message Andreas Kretschmer 2018-01-23 14:08:07 Re: need help on memory allocation