From: | "Luke Lonergan" <llonergan(at)greenplum(dot)com> |
---|---|
To: | "Oliver Jowett" <oliver(at)opencloud(dot)com>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> |
Cc: | "Steve Atkins" <steve(at)blighty(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: NOLOGGING option, or ? |
Date: | 2005-06-02 14:33:13 |
Message-ID: | BEC466B9.6CAC%llonergan@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Oliver,
> Haven't you just replaced one preprocessing step with another, then?
Generally not. The most common problem with the current choice of escape
character is that there are *lots* of data load scenarios with backslash in
the text strings. The extra preprocessing to escape them is unnecessary on
other databases and, in effect, causes the load to be even slower because
you have to prepare the data ahead of time.
Also, note that this patch can also do escape processing and the net result
will still be 5+ times faster than what is there.
In the data warehousing industry, data conversion and manipulation is
normally kept distinct from data loading. Conversion is done by tools
called ETL (Extract Transform Load) and the database will have a very fast
path for direct loading of the resulting data. PostgreSQL is definitely a
strange database right now in that there is a default filter applied to the
data on load.
It's even more strange because the load path is so slow, and now that we've
found that the slowness is there mostly because of non-optimized parsing and
attribute conversion routines. The question of how to do escape processing
is a separate one, but is wrapped up in the question of whether to introduce
a new loading routine or whether to optimize the old one.
- Luke
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2005-06-02 14:35:33 | Re: NOLOGGING option, or ? |
Previous Message | Marc G. Fournier | 2005-06-02 13:49:30 | Re: Google's Summer of Code ... |