From: | Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Bitmap scan is undercosted? |
Date: | 2017-12-07 00:17:13 |
Message-ID: | d06c555e-6710-2dd7-d019-4d7bc40750b4@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
> What seems odd to me is that in different kinds of tests (with
> different frequency of column values):
>
> i1 Rows Removed by Filter = 900156, 179792, 89762 (decreased a lot)
> i1 buffers = 46983, 44373, 39928 (decreased, but not a lot)
> i1 best case time = 756.045, 127.814, 79.492 (decreased a lot, as well
> as probably average case too)
> i1 cost estimates = 67358.15, 48809.34, 46544.80 (did not decrease a lot)
>
> i2 Rows Removed by Filter = 900835, 980350, 991389
> i2 buffers = 46985, 46985, 46987
> i2 best case time = 377.554, 346.481, 387.874
> i2 cost estimates = 39166.34, 39247.83, 40167.34
>
> It's odd that increase in actual execution time for "i1" was not
> reflected enough in cost estimates. The cost even didn't go below "i2"
> costs.
I've added some logging, in order to get the actual numbers which were
used for estimation.
--drop table if exists aaa;
--create table aaa as select floor(random()*100)::int num, (random()*10
< 1)::int flag from generate_series(1, 10000000) id;
--analyze aaa;
--set enable_bitmapscan = off; set enable_indexscan = on; set
enable_seqscan = off;
--set seq_page_cost = 1.0; set random_page_cost = 1.0; set
cpu_tuple_cost = 0.01; set cpu_index_tuple_cost = 0.005; set
cpu_operator_cost = 0.0025;
--create index i1 on aaa (num);
--drop index if exists i2;
--explain (analyze,verbose,costs,buffers) select * from aaa where num =
1 and flag = 1;
Index Scan using i1 on public.aaa (cost=0.43..46697.59 rows=10641
width=8) (actual time=0.047..153.521 rows=9826 loops=1)
Rows Removed by Filter: 89948
--drop index if exists i1;
--create index i2 on aaa (flag);
--explain (analyze,verbose,costs,buffers) select * from aaa where num =
1 and flag = 1;
Index Scan using i2 on public.aaa (cost=0.43..39583.11 rows=10641
width=8) (actual time=0.098..351.454 rows=9826 loops=1)
Rows Removed by Filter: 990249
LOG: cost_index:
seq_page_cost=1.00, random_page_cost=1.00,
cpu_tuple_cost=0.0100, cpu_index_tuple_cost=0.0050,
cpu_operator_cost=0.0025, effective_cache_size=131072
indexStartupCost=0.43, indexTotalCost=1103.94,
indexSelectivity=0.01076667, indexCorrelation=0.00208220
baserel->tuples=10000033.00, baserel->pages=44248.00,
baserel->allvisfrac=0.00000000
tuples_fetched=107667.00, pages_fetched=477.00
max_IO_cost=44248.0000, min_IO_cost=477.0000, csquared=0.0000
qpqual_cost.startup=0.0000, qpqual_cost.per_tuple=0.0025,
cpu_per_tuple=0.0125
spc_seq_page_cost=1.00, spc_random_page_cost=1.00
startup_cost=0.43, total_cost=46697.59
LOG: cost_index:
seq_page_cost=1.00, random_page_cost=1.00,
cpu_tuple_cost=0.0100, cpu_index_tuple_cost=0.0050,
cpu_operator_cost=0.0025, effective_cache_size=131072
indexStartupCost=0.43, indexTotalCost=10123.93,
indexSelectivity=0.09883333, indexCorrelation=0.82505685
baserel->tuples=10000000.00, baserel->pages=44248.00,
baserel->allvisfrac=0.00000000
tuples_fetched=988333.00, pages_fetched=4374.00
max_IO_cost=44248.0000, min_IO_cost=4374.0000, csquared=0.6807
qpqual_cost.startup=0.0000, qpqual_cost.per_tuple=0.0025,
cpu_per_tuple=0.0125
spc_seq_page_cost=1.00, spc_random_page_cost=1.00
startup_cost=0.43, total_cost=39583.11
Here is a break down of the total_cost into components, for i1 query and
for i2 query (some rounding was removed from the formula for brevity):
path->path.total_cost =
(indexTotalCost + qpqual_cost.startup) +
(max_IO_cost + csquared * (min_IO_cost - max_IO_cost)) +
(cpu_tuple_cost + qpqual_cost.per_tuple) * (indexSelectivity *
baserel->tuples);
path->path.total_cost =
1103.94 + 0.0000 + // 1103.94 +
44248.0000 + 0.0000 * (477.0000 - 44248.0000) + // 44248.00 +
(0.0100 + 0.0025) * (0.01076667 * 10000033.00) // 1345.84
= 46697.78; // = 46697.78;
path->path.total_cost =
(indexTotalCost + qpqual_cost.startup) +
(max_IO_cost + csquared * (min_IO_cost - max_IO_cost)) +
(cpu_tuple_cost + qpqual_cost.per_tuple) * (indexSelectivity *
baserel->tuples);
path->path.total_cost =
10123.93 + 0.0000 + // 10123.93 +
44248.0000 + 0.6807 * (4374.0000 - 44248.0000) + // 17105.77 +
(0.0100 + 0.0025) * (0.09883333 * 10000000.00) // 12354.17
= 39583.86; // = 39583.86;
PS.
The code used for logging:
/postgresql-9.3.1/src/backend/optimizer/path/costsize.c : cost_index()
ereport(LOG,
(errmsg("cost_index: \n"
"seq_page_cost=%.2f, random_page_cost=%.2f,
cpu_tuple_cost=%.4f, cpu_index_tuple_cost=%.4f, cpu_operator_cost=%.4f,
effective_cache_size=%.0f\n"
"indexStartupCost=%.2f, indexTotalCost=%.2f,
indexSelectivity=%.8f, indexCorrelation=%.8f\n"
"baserel->tuples=%.2f, baserel->pages=%.2f,
baserel->allvisfrac=%.8f\n"
"tuples_fetched=%.2f, pages_fetched=%.2f\n"
"max_IO_cost=%.4f, min_IO_cost=%.4f, csquared=%.4f\n"
"qpqual_cost.startup=%.4f,
qpqual_cost.per_tuple=%.4f, cpu_per_tuple=%.4f\n"
"spc_seq_page_cost=%.2f, spc_random_page_cost=%.2f\n"
"startup_cost=%.2f, total_cost=%.2f\n",
seq_page_cost, random_page_cost, cpu_tuple_cost,
cpu_index_tuple_cost, cpu_operator_cost, (double)effective_cache_size,
indexStartupCost, indexTotalCost, indexSelectivity,
indexCorrelation,
baserel->tuples, (double) baserel->pages,
baserel->allvisfrac,
tuples_fetched, pages_fetched,
max_IO_cost, min_IO_cost, csquared,
qpqual_cost.startup, qpqual_cost.per_tuple,
cpu_per_tuple,
spc_seq_page_cost, spc_random_page_cost,
startup_cost, startup_cost + run_cost
)));
Regards,
Vitaliy
From | Date | Subject | |
---|---|---|---|
Next Message | Nikita Glukhov | 2017-12-07 00:17:58 | Re: compress method for spgist - 2 |
Previous Message | Joshua D. Drake | 2017-12-07 00:03:14 | Logical replication without a Primary Key |
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew Hall | 2017-12-07 02:04:54 | Re: insert and query performance on big string table with pg_trgm |
Previous Message | Justin Pryzby | 2017-12-06 21:46:52 | Re: Bitmap scan is undercosted? - overestimated correlation and cost_index |