From: | Muhammad Ikram <mmikram(at)gmail(dot)com> |
---|---|
To: | Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> |
Cc: | sud <suds1434(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Load a csv or a avro? |
Date: | 2024-07-05 10:15:28 |
Message-ID: | CAGeimVo_EOJO+BViDnLoxEdFoFKQkeHU=gniEQ9e2GbjAUvUHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
Performance Considerations
Avro files are smaller due to compression so needing less I/O time.
whereas CSV files are simpler but larger in size so read/write will need
more time.
COPY command works very well with CSV files whereas ETL process is
required for handling Avro.
Regards,
Muhammad Ikram
On Fri, Jul 5, 2024 at 3:03 PM Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
wrote:
> pá 5. 7. 2024 v 11:08 odesílatel sud <suds1434(at)gmail(dot)com> napsal:
> >
> > Hello all,
> >
> > Its postgres database. We have option of getting files in csv and/or in
> avro format messages from another system to load it into our postgres
> database. The volume will be 300million messages per day across many files
> in batches.
> >
> > My question was, which format should we chose in regards to faster data
> loading performance ? and if any other aspects to it also should be
> considered apart from just loading performance?
>
> We are able to load ~300 million rows per one day using CSV and COPY
> functions (
> https://www.postgresql.org/docs/current/libpq-copy.html#LIBPQ-COPY-SEND)
>
>
>
--
Muhammad Ikram
From | Date | Subject | |
---|---|---|---|
Next Message | hubert depesz lubaczewski | 2024-07-05 11:05:24 | Re: psql help |
Previous Message | Josef Šimánek | 2024-07-05 10:02:50 | Re: Load a csv or a avro? |