From: | Nicolas Paris <niparisco(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: New Copy Formats - avro/orc/parquet |
Date: | 2018-02-11 20:41:26 |
Message-ID: | 20180211204126.i3sbze3dor2llxty@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Le 11 févr. 2018 à 21:03, Andres Freund écrivait :
>
>
> On February 11, 2018 12:00:12 PM PST, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
> >> > That is true, but the question is how significant the overhead is.
> >If
> >> > it's 50% then reducing it would make perfect sense. If it's 1% then
> >no
> >> > one if going to be bothered by it.
> >>
> >> I think it's pretty clear that it's going to be way way much more
> >than
> >> 1%.
> >
> >Good news but not sure to anderstand why.
>
> I think you might have misunderstood my reply? I'm saying that going through PROGRAM will have significant overhead. I can't quite make sense of the rest of your reply otherwise?
True, I misunderstood. Then I agree the computation overhead should be
non-negligible.
I have also the storage and network transfers overhead in mind:
All those new formats are compressed; this is not true for current
postgres BINARY format and obviously text based format. By experience,
the binary format is 10 to 30% larger than the text one. On the
contrary, an ORC file can be up to 10 times smaller than a text base
format.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-02-11 20:53:46 | Re: New Copy Formats - avro/orc/parquet |
Previous Message | Andres Freund | 2018-02-11 20:03:14 | Re: New Copy Formats - avro/orc/parquet |