Re: Bug in copy

From: me nefcanto <sn(dot)1361(at)gmail(dot)com>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Bug in copy
Date: 2025-02-25 06:46:28
Message-ID: CAEHBEOC_xWMiq03LWm_WmCgJRAko-uzKK5cPRHK8CCMuSKJ7Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I didn't understand how putting a catch-all handler turns the Postgres into
an ETL application. And about all those numerous Postgres-specific
solutions, can you please enumerate them? Because there are not numerous
options, just a couple of options, and none of them work. Inserting the
data into a table without constraint and then performing a merge won't
work. Because merge also does not have a catch-all handler. What other
solutions have been proposed? pg_bulkload?

So basically two solutions have been suggested, Merge and pg_bulkload. One
does not work (just handles more error and fails again on a single error)
and the other is a command-line utility.

I also did a simple research using AI and came up with this table:

RDBMS System - Row-Level Error Handling for Bulk Copy
Oracle Database - Yes (e.g., SQL*Loader can direct bad records)
Microsoft SQL Server - No (BULK INSERT generally aborts on row errors)
MySQL - Yes (using LOAD DATA with IGNORE options)
PostgreSQL - No (COPY stops on error without workarounds)
IBM DB2 - Yes (LOAD utility supports error logging)
Teradata - Yes (has options for capturing reject rows)
SAP HANA - No (bulk load typically aborts on errors)
MariaDB - Yes (similar to MySQL with error-handling options)
SQLite - No (no dedicated bulk load command with error handling)
Amazon Redshift - Yes (COPY command allows error tolerance settings)
Google BigQuery - Yes (load jobs can be configured to skip bad rows)
Netezza - Yes (supports reject files for bulk loads)
Snowflake - Yes (COPY INTO offers error handling parameters)
Vertica - Yes (COPY command logs errors and continues)

10 RDBMS systems support catch-all problems, and only 4 do not. This means
that the majority have found it to be a useful feature.

Greg, may I ask what's your argument against having a catch-all error in
bulk operations? Because the ETL argument is not a valid one. As I
mentioned previously, none of my real-world requirements were related to
the ETL.

Regards
Saeed

On Mon, Feb 24, 2025 at 7:25 PM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
wrote:

> On Mon, Feb 24, 2025 at 9:09 AM me nefcanto <sn(dot)1361(at)gmail(dot)com> wrote:
>
>> The point is, the database schema is not in our hands. We don't know what
>> rules exist on each table and what rules change. And it's not practical and
>> feasible to spend resources on keeping our bulk insertion logic with the
>> database changes.
>>
>
> That's a company problem, and not one that can be solved by changing the
> way COPY works.
>
>
>> Is there a problem in implementing this? After all one expects the most
>> advanced open source database to support this real-world requirement.
>>
>
> We're not going to change Postgres into an ETL application. This thread
> has given you numerous Postgres-specific solutions, but there is also no
> shortage of ETL applications you can try out.
>
> Cheers,
> Greg
>
> --
> Crunchy Data - https://www.crunchydata.com
> Enterprise Postgres Software Products & Tech Support
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kritika Agarwal 2025-02-25 07:47:23 Re: error -10825
Previous Message PG Bug reporting form 2025-02-25 04:15:02 BUG #18824: Inconsistent results for isolation level settings