From: | Charles Gomes <charlesrg(at)outlook(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Performance on Bulk Insert to Partitioned Table |
Date: | 2012-12-21 14:30:07 |
Message-ID: | BLU002-W187422443209B681CE83B5FAB360@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
The BBU does combines the writes.
I've benchmarked using a single table and it took 1:34:21.549959 to insert 1188000000 rows. (70 writers to a single table)
I've also benchmarked having writers targeting individual partitions and they get the same job done in 1 Hour.
I/O is definitely not the botleneck.
Without changing hardware it accelerates things almost 4 times, looks like to be a delay on the way Postgresql handles the partitions or the time taking for the trigger to select what partition to insert.
When targeting I issue commands that insert directly into the partition "INSERT INTO quotes_DATE VALUES() ..,..,...,.., " 10k rows at time.
When not targeting I leave to the trigger to decide:
CREATE OR REPLACE FUNCTION quotes_insert_trigger()RETURNS trigger AS $$
DECLARE
tablename varchar(24);
bdate varchar(10);
edate varchar(10);
BEGIN
tablename = 'quotes_' || to_char(new.received_time,'YYYY_MM_DD');
EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)'
USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Maybe translating this trigger to C could help. But I haven't heart anyone that did use partitioning with a trigger in C and I don't have the know how on it without examples.
________________________________
> Date: Thu, 20 Dec 2012 19:24:09 -0800
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: jeff(dot)janes(at)gmail(dot)com
> To: charlesrg(at)outlook(dot)com
> CC: pgsql-performance(at)postgresql(dot)org
>
>
>
> On Thursday, December 20, 2012, Charles Gomes wrote:
> Jeff,
>
> The 8288 writes are fine, as the array has a BBU, it's fine. You see
> about 4% of the utilization.
>
> BBU is great for latency, but it doesn't do much for throughput, unless
> it is doing write combining behind the scenes. Is it HDD or SSD behind
> the BBU? Have you bench-marked it on randomly scattered 8k writes?
>
> I've seen %util reports that were low while watching a strace showed
> obvious IO freezes. So I don't know how much faith to put into low
> %util.
>
>
>
> To target directly instead of doing :
> INSERT INTO TABLE VALUES ()
> I use:
> INSERT INTO TABLE_PARTITION_01 VALUES()
>
> But how is it deciding what partition to use? Does it have to
> re-decide for every row, or does each thread serve only one partition
> throughout its life and so makes the decision only once?
>
>
>
> By targeting it I see a huge performance increase.
>
> But is that because by targeting you are by-passing the the over-head
> of triggers, or is it because you are loading the rows in an order
> which leads to more efficient index maintenance?
>
>
> I haven't tested using 1Billion rows in a single table. The issue is
> that in the future it will grow to more than 1 billion rows, it will
> get to about 4Billion rows and that's when I believe partition would be
> a major improvement.
>
> The way that partitioning gives you performance improvements is by you
> embracing the partitioning, for example by targeting the loading to
> just one partition without any indexes, creating indexes, and then
> atomically attaching it to the table. If you wish to have partitions,
> but want to use triggers to hide that partitioning from you, then I
> don't think you can expect to get much of a speed up through using
> partitions.
>
> Any way, the way I would approach it would be to load to a single
> un-partitioned table, and also load to a single dummy-partitioned table
> which uses a trigger that looks like the one you want to use for real,
> but directs all rows to a single partition. If these loads take the
> same time, you know it is not the trigger which is limiting.
>
> Cheers,
>
> Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2012-12-21 15:34:13 | Re: Slow queries after vacuum analyze |
Previous Message | Charles Gomes | 2012-12-21 14:14:46 | Re: Performance on Bulk Insert to Partitioned Table |