Quick Links

Calculating 95th percentiles

From:	Landreville <landreville(at)deadtreepages(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Calculating 95th percentiles
Date:	2011-03-04 15:18:01
Message-ID:	AANLkTi=S-NKeJAwN+NXXJvtpv0vAbO4VByEB4hjUiPu0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

This is not a performance bug -- my query takes a reasonably long
amount of time, but I would like to see if I can get this calculation
any faster in my setup.

I have a table:
volume_id serial primary key
switchport_id integer not null
in_octets bigint not null
out_octets bigint not null
insert_timestamp timestamp default now()
with indexes on volume_id, switchport_id, insert_timestamp.

That is partitioned into about 3000 tables by the switchport_id (FK to
a lookup table), each table has about 30 000 rows currently (a row is
inserted every 5 minutes into each table).
I have select queries that filter based on switchport_id and
timestamp. Constraint exclusion is used with the switchport_id to get
the right table and the insert_timestamp has an index on it (on each
table).

Any time the volume tables are queried it is to calculate the deltas
between each in_octets and out_octets from the previous row (ordered
by timestamp). The delta is used because the external system, where
the data is retrieved from, will roll over the value sometimes. I
have a function to do this calcuation:

create or replace function traffic.get_delta_table(p_switchport_id integer,
p_start_date date, p_end_date date)
returns table( volume_id integer,
insert_timestamp timestamp,
out_delta bigint,
out_rate bigint,
out_rate_order bigint,
in_delta bigint,
in_rate bigint,
in_rate_order bigint)
as $$
declare
begin
-- we need to force pgsql to make a new plan for each query so it can
-- use constraint exclusions on switchport id to determine the
partition table to scan
return query execute 'select
t.volume_id,
t.insert_timestamp,
t.out_delta,
t.out_delta * 8 / t.time_difference as out_rate,
row_number() over (order by t.out_delta * 8 /
t.time_difference) as out_rate_order,
t.in_delta,
t.in_delta * 8 / t.time_difference as in_rate,
row_number() over(order by t.in_delta * 8 / t.time_difference)
as in_rate_order
from
(select
n.volume_id,
n.insert_timestamp,
extract(epoch from (n.insert_timestamp -
lag(n.insert_timestamp,1,n.insert_timestamp) over(order by
n.insert_timestamp)))::integer as time_difference,
case
when n.out_octets < lag(out_octets,1,n.out_octets)
over(order by n.insert_timestamp)
then n.out_octets
else n.out_octets - lag(out_octets,1,n.out_octets)
over(order by n.insert_timestamp)
end as out_delta,
case
when n.in_octets < lag(in_octets,1,n.in_octets) over(order
by n.insert_timestamp)
then n.in_octets
else n.in_octets - lag(in_octets,1,n.in_octets) over(order
by n.insert_timestamp)
end as in_delta
from volume as n
where n.insert_timestamp between $1 and $2
and n.switchport_id = '||p_switchport_id||'
and in_octets != 0
and out_octets != 0
) as t
where time_difference is not null and time_difference != 0' using
p_start_date, p_end_date;

end; $$ language plpgsql;

The query inside the function's plan:
WindowAgg (cost=2269.62..2445.35 rows=6390 width=32) (actual
time=7526.526..7531.855 rows=6622 loops=1)
-> Sort (cost=2269.62..2285.60 rows=6390 width=32) (actual
time=7526.497..7527.924 rows=6622 loops=1)
Sort Key: (((t.in_delta * 8) / t.time_difference))
Sort Method: external sort Disk: 432kB
-> WindowAgg (cost=1753.90..1865.72 rows=6390 width=32)
(actual time=2613.593..2618.727 rows=6622 loops=1)
-> Sort (cost=1753.90..1769.87 rows=6390 width=32)
(actual time=2613.566..2614.550 rows=6622 loops=1)
Sort Key: (((t.out_delta * 8) / t.time_difference))
Sort Method: quicksort Memory: 710kB
-> Subquery Scan on t (cost=978.89..1350.00
rows=6390 width=32) (actual time=2582.254..2606.708 rows=6622 loops=1)
Filter: ((t.time_difference IS NOT NULL)
AND (t.time_difference <> 0))
-> WindowAgg (cost=978.89..1269.32
rows=6454 width=28) (actual time=2582.243..2596.546 rows=6623 loops=1)
-> Sort (cost=978.89..995.03
rows=6454 width=28) (actual time=2582.120..2583.172 rows=6623 loops=1)
Sort Key: n.insert_timestamp
Sort Method: quicksort Memory: 710kB
-> Result (cost=8.87..570.49
rows=6454 width=28) (actual time=1036.720..2576.755 rows=6623 loops=1)
-> Append
(cost=8.87..570.49 rows=6454 width=28) (actual time=1036.718..2574.719
rows=6623 loops=1)
-> Bitmap Heap
Scan on volume n (cost=8.87..12.90 rows=1 width=28) (actual
time=0.055..0.055 rows=0 loops=1)
Recheck Cond:
((switchport_id = 114) AND (insert_timestamp >= '2011-02-01
00:00:00'::timestamp without time zone) AND (insert_timestamp <=
'2011-03-02 00:00:00'::timestamp without time zone))
Filter:
((in_octets <> 0) AND (out_octets <> 0))
-> BitmapAnd
(cost=8.87..8.87 rows=1 width=0) (actual time=0.053..0.053 rows=0
loops=1)
->
Bitmap Index Scan on volume_parent_switchport_id_idx (cost=0.00..4.30
rows=7 width=0) (actual time=0.045..0.045 rows=0 loops=1)

Index Cond: (switchport_id = 114)
->
Bitmap Index Scan on volume_parent_insert_timestamp_idx
(cost=0.00..4.32 rows=7 width=0) (never executed)

Index Cond: ((insert_timestamp >= '2011-02-01 00:00:00'::timestamp
without time zone) AND (insert_timestamp <= '2011-03-02
00:00:00'::timestamp without time zone))
-> Bitmap Heap
Scan on volume_114 n (cost=142.40..557.59 rows=6453 width=28) (actual
time=1036.662..2573.116 rows=6623 loops=1)
Recheck Cond:
((insert_timestamp >= '2011-02-01 00:00:00'::timestamp without time
zone) AND (insert_timestamp <= '2011-03-02 00:00:00'::timestamp
without time zone))
Filter:
((in_octets <> 0) AND (out_octets <> 0) AND (switchport_id = 114))
-> Bitmap
Index Scan on volume_114_insert_timestamp_idx (cost=0.00..140.78
rows=6453 width=0) (actual time=981.034..981.034 rows=6623 loops=1)
Index
Cond: ((insert_timestamp >= '2011-02-01 00:00:00'::timestamp without
time zone) AND (insert_timestamp <= '2011-03-02 00:00:00'::timestamp
without time zone))
Total runtime: 7567.261 ms

This ends up taking anywhere from 300ms to 7000ms (usually its around
300-400ms) and returns about 8000 rows.

To get the 95th percentile its a couple simple selects after putting
the result set from the above function into a temporary table:
create temporary table deltas on commit drop as
select * from get_delta_table(p_switchport_id, p_start_date,
p_end_date);

select round(count(volume_id) * 0.95) into v_95th_row from deltas;
select in_rate into v_record.in_95th from deltas where
in_rate_order = v_95th_row;
select out_rate into v_record.out_95th from deltas where
out_rate_order = v_95th_row;
select sum(in_delta), sum(out_delta) into v_record.in_total,
v_record.out_total from deltas;

Unfortunately using a temporary table means that I cannot run this
query on the read-only slave, but I can't see a way around using one.
The master has 3000 inserts running against it every 5 minutes --
which used to be every 1 minute but the space and time for calculating
5x the current number of rows was not worth it.

My server has 1GB of memory is running Red Hat and the only daemon is Postgres:
effective cache size is 768MB
shared buffers are 256MB
work_mem is 2MB (I changed it when the explain analyze was showing
1.5MB used for an on disk sort)
max locks per transaction is 3000 (I changed it when I started getting
the error that suggests I change the max locks per transaction)

Any ideas on speeding this up?

Thanks,

Landreville

Responses

Re: Calculating 95th percentiles at 2011-03-05 07:54:27 from Pierre C
Re: Calculating 95th percentiles at 2011-03-06 00:34:44 from marcin mank

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Robert Haas	2011-03-04 16:07:57	Re: Slow join on partitioned table
Previous Message	Robert Haas	2011-03-04 14:55:39	Re: Is it require further tuning