From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alan Li <ali(at)truviso(dot)com> |
Subject: | Re: 8.4 open item: copy performance regression? |
Date: | 2009-06-21 16:53:56 |
Message-ID: | 4A3E65A4.5030101@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> I wonder if using the small ring showed any benefit when the COPY is not
>> WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen,
>> so the small ring might have some L2 cache benefits.
>
> I think the notion that we might get a cache win from a smaller ring
> is an illusion. We're not expecting to go back and re-read from a
> previously filled page in this scenario. In any case, all of the
> profiling results so far show that the CPU bottlenecks are elsewhere.
> Until we can squeeze an order of magnitude out of COPY's data parsing
> and/or XLogInsert, any possible cache effects will be down in the noise.
we also need to take a serious look at our locking overhead - WAL logged
COPY is already taking a significant performance hit with just a second
process running in parallel(into a seperate table).
I just did some testing using those 16MB buffer, the upthread mentioned
postgresql.conf and a 20GB tmpfs.
The following copying 3M rows(each) into a seperate table of the same
database.
processes total time(s) rows/s rows/s - per core
1 17.5 171428.57 171428.57
2 20.8 288461.54 144230.77
4 25.5 470588.24 117647.06
6 31.1 578778.14 96463.02
8 41.4 579710.14 72463.77
10 63 476190.48 47619.05
12 89 404494.38 33707.87
14 116 362068.97 25862.07
16 151 317880.79 19867.55
the higher the process count the more erratic the box behaves - it will
show a very high context switch rate (between 300000 and 400000/s) a
large amount of idle time (>60%!).
example vmstat 5 output for the 12 process test:
7 0 0 21654500 45436 12932516 0 0 0 3 1079 336941
34 7 59 0 0
6 0 0 21354044 45444 13232444 0 0 0 52 1068 341836
35 7 59 0 0
4 0 0 21053832 45452 13531472 0 0 0 23 1082 341672
35 7 59 0 0
9 0 0 20751136 45460 13833336 0 0 0 41 1063 344117
35 7 59 0 0
6 0 0 20443856 45468 14138116 0 0 0 14 1079 349398
35 7 58 0 0
8 0 0 20136592 45476 14444644 0 0 0 8 1060 351569
35 7 58 0 0
10 0 0 19836600 45484 14743320 0 0 0 144 1086 341533
35 7 58 0 0
7 0 0 19540472 45492 15039616 0 0 0 94 1067 337731
36 7 58 0 0
2 0 0 19258244 45500 15321156 0 0 0 15 1079 311394
34 6 60 0 0
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-06-21 17:00:09 | Re: 8.4 open item: copy performance regression? |
Previous Message | Tom Lane | 2009-06-21 16:38:34 | Re: 8.4 open item: copy performance regression? |