slower merge join on sorted data chosen over nested loop

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: slower merge join on sorted data chosen over nested loop
Date: 2005-10-07 00:02:41
Message-ID: s34574e9.084@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In both the 8.1beta2 and using a build from this morning's
dev snapshot, this query ran slower than expected:

select count(*)
from "DbTranRepository" AS "dtr"
inner join "DbTranLogRecord" AS "dtlr"
on ( "dtlr"."countyNo" = "dtr"."countyNo"
and "dtlr"."tranImageSeqNo" = "dtr"."tranImageSeqNo"
and "dtlr"."tableName" = 'Cal'
)
where "dtr"."countyNo" = 40
and "dtr"."timestampValue" between '2005-09-30 00:00:00' and '2005-10-01 00:00:00';

Fresh restore from dump. ANALYZE run. No other activity.
Autovacuum was disabled prior to the restore and test.
8GB dual Xeon SuSE 9.3 system hitting RAID 5.

With the OS cache flushed by dd:

Aggregate (cost=176218.64..176218.65 rows=1 width=0) (actual time=32081.623..32081.626 rows=1 loops=1)
-> Merge Join (cost=175627.39..176218.09 rows=218 width=0) (actual time=31774.573..32071.776 rows=3007 loops=1)
Merge Cond: ("outer"."?column3?" = "inner"."?column3?")
-> Sort (cost=48913.06..48975.72 rows=25067 width=17) (actual time=5828.984..5954.706 rows=39690 loops=1)
Sort Key: (dtr."tranImageSeqNo")::numeric
-> Bitmap Heap Scan on "DbTranRepository" dtr (cost=297.07..47081.47 rows=25067 width=17) (actual time=69.056..5560.895 rows=39690 loops=1)
Recheck Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
-> Bitmap Index Scan on "DbTranRepository_timestamp" (cost=0.00..297.07 rows=25067 width=0) (actual time=52.675..52.675 rows=39690 loops=1)
Index Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
-> Sort (cost=126714.34..126945.94 rows=92640 width=17) (actual time=25130.457..25551.291 rows=109605 loops=1)
Sort Key: (dtlr."tranImageSeqNo")::numeric
-> Bitmap Heap Scan on "DbTranLogRecord" dtlr (cost=1037.84..119071.84 rows=92640 width=17) (actual time=203.528..23588.122 rows=114113 loops=1)
Recheck Cond: ((40 = ("countyNo")::smallint) AND (("tableName")::text = 'Cal'::text))
-> Bitmap Index Scan on "DbTranLogRecord_tableName" (cost=0.00..1037.84 rows=92640 width=0) (actual time=146.412..146.412 rows=114113 loops=1)
Index Cond: ((40 = ("countyNo")::smallint) AND (("tableName")::text = 'Cal'::text))
Total runtime: 32091.211 ms

With the OS cache populated by previous run of this plan:

Aggregate (cost=176218.64..176218.65 rows=1 width=0) (actual time=4510.590..4510.593 rows=1 loops=1)
-> Merge Join (cost=175627.39..176218.09 rows=218 width=0) (actual time=4203.451..4500.785 rows=3007 loops=1)
Merge Cond: ("outer"."?column3?" = "inner"."?column3?")
-> Sort (cost=48913.06..48975.72 rows=25067 width=17) (actual time=686.388..812.160 rows=39690 loops=1)
Sort Key: (dtr."tranImageSeqNo")::numeric
-> Bitmap Heap Scan on "DbTranRepository" dtr (cost=297.07..47081.47 rows=25067 width=17) (actual time=30.432..427.463 rows=39690 loops=1)
Recheck Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
-> Bitmap Index Scan on "DbTranRepository_timestamp" (cost=0.00..297.07 rows=25067 width=0) (actual time=19.089..19.089 rows=39690 loops=1)
Index Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
-> Sort (cost=126714.34..126945.94 rows=92640 width=17) (actual time=2705.007..3124.966 rows=109605 loops=1)
Sort Key: (dtlr."tranImageSeqNo")::numeric
-> Bitmap Heap Scan on "DbTranLogRecord" dtlr (cost=1037.84..119071.84 rows=92640 width=17) (actual time=96.046..1195.302 rows=114113 loops=1)
Recheck Cond: ((40 = ("countyNo")::smallint) AND (("tableName")::text = 'Cal'::text))
-> Bitmap Index Scan on "DbTranLogRecord_tableName" (cost=0.00..1037.84 rows=92640 width=0) (actual time=54.917..54.917 rows=114113 loops=1)
Index Cond: ((40 = ("countyNo")::smallint) AND (("tableName")::text = 'Cal'::text))
Total runtime: 4520.130 ms

Without the option to use merge join, it picked a plan which ran in
one-third to one-half the time, depending on caching.

dtr=> set enable_mergejoin=off

With the OS cache flushed by dd:

Aggregate (cost=298741.48..298741.49 rows=1 width=0) (actual time=14289.207..14289.210 rows=1 loops=1)
-> Nested Loop (cost=0.00..298740.94 rows=218 width=0) (actual time=143.436..14277.124 rows=3007 loops=1)
-> Index Scan using "DbTranRepository_timestamp" on "DbTranRepository" dtr (cost=0.00..49419.45 rows=25067 width=17) (actual time=33.625..11510.723 rows=39690 loops=1)
Index Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
-> Index Scan using "DbTranLogRecordPK" on "DbTranLogRecord" dtlr (cost=0.00..9.93 rows=1 width=17) (actual time=0.062..0.062 rows=0 loops=39690)
Index Cond: ((40 = (dtlr."countyNo")::smallint) AND ((dtlr."tranImageSeqNo")::numeric = ("outer"."tranImageSeqNo")::numeric))
Filter: (("tableName")::text = 'Cal'::text)
Total runtime: 14289.305 ms

With the OS cache populated by previous run of this plan:

Aggregate (cost=298741.48..298741.49 rows=1 width=0) (actual time=1476.593..1476.596 rows=1 loops=1)
-> Nested Loop (cost=0.00..298740.94 rows=218 width=0) (actual time=1.373..1464.720 rows=3007 loops=1)
-> Index Scan using "DbTranRepository_timestamp" on "DbTranRepository" dtr (cost=0.00..49419.45 rows=25067 width=17) (actual time=0.083..412.268 rows=39690 loops=1)
Index Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))
-> Index Scan using "DbTranLogRecordPK" on "DbTranLogRecord" dtlr (cost=0.00..9.93 rows=1 width=17) (actual time=0.019..0.019 rows=0 loops=39690)
Index Cond: ((40 = (dtlr."countyNo")::smallint) AND ((dtlr."tranImageSeqNo")::numeric = ("outer"."tranImageSeqNo")::numeric))
Filter: (("tableName")::text = 'Cal'::text)
Total runtime: 1476.681 ms

listen_addresses = '*' # what IP interface(s) to listen on;
max_connections = 100 # note: increasing max_connections costs
shared_buffers = 20000 # min 16 or max_connections*2, 8KB each
work_mem = 10240 # min 64, size in KB
max_fsm_pages = 100000 # min max_fsm_relations*16, 6 bytes each
wal_buffers = 20 # min 4, 8KB each
effective_cache_size = 917504 # typically 8KB each
random_page_cost = 2 # units are one sequential page fetch
log_line_prefix = '[%m] %p %q<%u %d %r> ' # Special values:
stats_start_collector = on
stats_row_level = on
autovacuum = false # enable autovacuum subprocess?
autovacuum_naptime = 10 # time between autovacuum runs, in secs
autovacuum_vacuum_threshold = 1 # min # of tuple updates before
autovacuum_analyze_threshold = 1 # min # of tuple updates before
autovacuum_vacuum_scale_factor = 0.2 # fraction of rel size before
autovacuum_analyze_scale_factor = 0.1 # fraction of rel size before
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
sql_inheritance = off

If random_page_cost is reduced to 1.2, index scans are used
instead of bitmap index scans. That makes little difference for
the run time with this query -- at least when its data is cached.

Table "public.DbTranRepository"
Column | Type | Modifiers
------------------+------------------------+-----------
countyNo | "CountyNoT" | not null
tranImageSeqNo | "TranImageSeqNoT" | not null
timestampValue | "TimestampT" | not null
transactionImage | "ImageT" |
status | character(1) | not null
queryName | "QueryNameT" |
runDuration | numeric(15,0) |
userId | "UserIdT" |
functionalArea | character varying(50) |
sourceRef | character varying(255) |
url | "URLT" |
tranImageSize | numeric(15,0) |
Indexes:
"DbTranRepositoryPK" PRIMARY KEY, btree ("countyNo", "tranImageSeqNo")
"DbTranRepository_timestamp" btree ("countyNo", "timestampValue")
"DbTranRepository_userId" btree ("countyNo", "userId", "timestampValue")

Table "public.DbTranLogRecord"
Column | Type | Modifiers
----------------+-------------------+-----------
countyNo | "CountyNoT" | not null
tranImageSeqNo | "TranImageSeqNoT" | not null
logRecordSeqNo | "LogRecordSeqNoT" | not null
operation | "OperationT" | not null
tableName | "TableNameT" | not null
Indexes:
"DbTranLogRecordPK" PRIMARY KEY, btree ("countyNo", "tranImageSeqNo", "logRecordSeqNo")
"DbTranLogRecord_tableName" btree ("countyNo", "tableName", operation)

Need any other info?

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-10-07 01:27:55 Vote needed: revert beta2 changes or not?
Previous Message Martijn van Oosterhout 2005-10-06 21:40:17 Re: [PERFORM] A Better External Sort?