Re: New Copy Formats - avro/orc/parquet

From: Nicolas Paris <niparisco(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: New Copy Formats - avro/orc/parquet
Date: 2018-02-11 20:41:26
Message-ID: 20180211204126.i3sbze3dor2llxty@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Le 11 févr. 2018 à 21:03, Andres Freund écrivait :
>
>
> On February 11, 2018 12:00:12 PM PST, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
> >> > That is true, but the question is how significant the overhead is.
> >If
> >> > it's 50% then reducing it would make perfect sense. If it's 1% then
> >no
> >> > one if going to be bothered by it.
> >>
> >> I think it's pretty clear that it's going to be way way much more
> >than
> >> 1%.
> >
> >Good news but not sure to anderstand why.
>
> I think you might have misunderstood my reply? I'm saying that going through PROGRAM will have significant overhead. I can't quite make sense of the rest of your reply otherwise?

True, I misunderstood. Then I agree the computation overhead should be
non-negligible.

I have also the storage and network transfers overhead in mind:
All those new formats are compressed; this is not true for current
postgres BINARY format and obviously text based format. By experience,
the binary format is 10 to 30% larger than the text one. On the
contrary, an ORC file can be up to 10 times smaller than a text base
format.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-02-11 20:53:46 Re: New Copy Formats - avro/orc/parquet
Previous Message Andres Freund 2018-02-11 20:03:14 Re: New Copy Formats - avro/orc/parquet