Re: COPY FROM performance improvements

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY FROM performance improvements
Date: 2005-08-10 04:39:55
Message-ID: BF1ED32B.C206%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-performance

Andrew,

> Arguably this might exaggerate the effect quite significantly. Users
> will want to know the real time effect on a complete COPY. Depending on
> how much the pasing is in the total time your 20% improvement in parsing
> might only be a small fraction of 20% improvement in COPY.

Arguably has already been argued. We knew this because we profiled the
entire COPY process and came up with this approx. breakdown for a specific
case:
Parsing: 25%
Attribute Conversion: 40%
Data Insertion: 35%

Net copy rate: 8 MB/s on a filesystem that does 240 MB/s

So - if we speed up parsing by 500% or 450%, the end result is about a
20-30% speed increase in the overall process.

Note that we're still a *long* way from getting anywhere near the limit of
the I/O subsystem at 12 MB/s. Oracle can probably get 5-8 times this data
rate, if not better.

The attribute conversion logic is also very slow and needs similar
improvements.

The reason we focused first on Parsing is that our MPP version of Bizgres
will reach data loading rates approaching the parsing speed, so we needed to
improve that part to get it out of the way.

We will continue to improve COPY speed in Bizgres so that we can provide
comparable COPY performance to Oracle and MySQL.

> Like you, I'm happy we have seen a 5 times improvement in parsing. Is it
> possible you can factor out something smallish from your patch that
> might make up the balance?

That parsing was 25% of the workload was traceable to a 3 main things:
1) Per character acquisition from source instead of buffering
2) Frequent interruption of the parsing pipeline to handle attribute
conversion
3) Lack of micro-parallelism in the character finding logic

Tom's patch took our contribution from (1) and (2) and his improvements, and
he rejected (3). The net result is that we lost performance from the lack
of (3) but gained performance from his improvements of (1) and (2).

I believe that re-introducing (3) may bring us from 100 MB/s to 150 MB/s
parsing speed.

- Luke

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Luke Lonergan 2005-08-10 04:48:02 Re: COPY FROM performance improvements
Previous Message Alvaro Herrera 2005-08-10 03:58:02 Re: [HACKERS] Autovacuum loose ends

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2005-08-10 04:48:02 Re: COPY FROM performance improvements
Previous Message Andrew Dunstan 2005-08-10 01:01:38 Re: COPY FROM performance improvements