From: | Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Declarative partitioning optimization for large amount of partitions |
Date: | 2017-02-28 14:25:09 |
Message-ID: | 20170228142509.GA19777@e733.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
I decided to figure out whether current implementation of declarative
partitioning has any bottlenecks when there is a lot of partitions. Here
is what I did [1].
```
-- init schema
\timing on
CREATE TABLE part_test (pk int not null, k int, v varchar(128)) PARTITION BY RANGE(pk);
do $$
declare
i integer;
begin
for i in 1 .. 10000
loop
raise notice 'i = %', i;
execute ('CREATE TABLE part_test_' || i ||
' PARTITION OF part_test FOR VALUES FROM (' ||
(1 + (i-1)*1000) || ') to (' || ( (i * 1000) + 1) || ');'
);
end loop;
end $$;
-- fill tables with some data
do $$
declare
i integer;
begin
for i in 1 .. 100*1000
loop
raise notice 'i = %', i;
execute ('insert into part_test values ( ceil(random()*(10000-1)*1000), ceil(random()*10000*1000), '''' || ceil(random()*10000*1000) );');
end loop;
end $$;
```
Then:
```
# 2580 is some pk that exists
echo 'select * from part_test where pk = 2580;' > t.sql
pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax
```
`perf top` showed to bottlenecks [2]. A stacktrace for the first one
looks like this [3]:
```
0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689
1689 if (entry->t_id == rel_id)
#0 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689
#1 0x00000000007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at pgstat.c:1666
#2 0x00000000004c7090 in relation_open (relationId=25696, lockmode=0) at heapam.c:1137
#3 0x00000000004c72c9 in heap_open (relationId=25696, lockmode=0) at heapam.c:1291
(skipped)
```
And here is a stacktrace for the second bottleneck [4]:
```
0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199
199 forboth(lo, rels_list, li, rel_numparents)
#0 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199
#1 0x000000000077fc9f in expand_inherited_rtentry (root=0x1badcb8, rte=0x1b630b8, rti=1) at prepunion.c:1408
#2 0x000000000077fb67 in expand_inherited_tables (root=0x1badcb8) at prepunion.c:1335
#3 0x0000000000767526 in subquery_planner (glob=0x1b63cc0, parse=0x1b62fa0, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at planner.c:568
(skipped)
```
The first one could be easily fixed by introducing a hash table
(rel_id -> pgStatList entry). Perhaps hash table should be used only
after some threshold. Unless there are any objections I will send a
corresponding patch shortly.
I didn't explored the second bottleneck closely yet but at first glance
it doesn't look much more complicated.
Please don't hesitate to share your thoughts regarding this matter.
[1] http://afiskon.ru/s/e3/5f47af9102_benchmark.txt
[2] http://afiskon.ru/s/00/2008c4ae66_temp.png
[3] http://afiskon.ru/s/23/650f0afc89_stack.txt
[4] http://afiskon.ru/s/03/a7e685a4db_stack2.txt
--
Best regards,
Aleksander Alekseev
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo Nagata | 2017-02-28 14:33:13 | [POC] hash partitioning |
Previous Message | Jorge Solórzano | 2017-02-28 14:24:37 | Re: rename pg_log directory? |