Re: How batch processing works

From: Lok P <loknath(dot)73(at)gmail(dot)com>
To: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: How batch processing works
Date: 2024-09-19 13:12:45
Message-ID: CAKna9VZGR+BkBOocbShxRpcXJMgeT=aFDYut+Va9n5P42mqtcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson <ronljohnsonjr(at)gmail(dot)com> wrote:

> On Thu, Sep 19, 2024 at 5:24 AM Lok P <loknath(dot)73(at)gmail(dot)com> wrote:
>
>>
>>
>> [snip]
>
>> DO $$
>> DECLARE
>> num_inserts INTEGER := 100000;
>> batch_size INTEGER := 50;
>> start_time TIMESTAMP;
>> end_time TIMESTAMP;
>> elapsed_time INTERVAL;
>> i INTEGER;
>> BEGIN
>> -- Method 1: Individual Inserts with Commit after every Row
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..num_inserts LOOP
>> INSERT INTO parent_table VALUES (i, 'a');
>> COMMIT;
>> END LOOP;
>>
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 1: Individual Inserts with Commit after every Row',
>> start_time, end_time, elapsed_time);
>>
>> -- Method 2: Individual Inserts with Commit after 100 Rows
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..num_inserts LOOP
>> INSERT INTO parent_table2 VALUES (i, 'a');
>> -- Commit after every 100 rows
>> IF i % batch_size = 0 THEN
>> COMMIT;
>> END IF;
>> END LOOP;
>>
>> -- Final commit if not already committed
>> commit;
>>
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
>> start_time, end_time, elapsed_time);
>>
>> -- Method 3: Batch Inserts with Commit after all
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..(num_inserts / batch_size) LOOP
>> INSERT INTO parent_table3 VALUES
>> (1 + (i - 1) * batch_size, 'a'),
>>
> [snip]
>
>> (49 + (i - 1) * batch_size, 'a'),
>> (50 + (i - 1) * batch_size, 'a'));
>> COMMIT;
>> END LOOP;
>>
>> COMMIT; -- Final commit for all
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
>> end_time, elapsed_time);
>>
>> END $$;
>>
>
> Reproduce what behavior?
>
> Anyway, plpgsql functions (including anonymous DO statements) are -- to
> Postgresql -- single statements. Thus, they'll be faster than
> individual calls..
>
> An untrusted language like plpython3u might speed things up even more, if
> you have to read a heterogeneous external file and insert all the records
> into the db.
>

Here if you see my script , the method-1 is doing commit after each row
insert. And method-2 is doing a batch commit i.e. commit after every "50"
row. And method-3 is doing a true batch insert i.e. combining all the 50
values in one insert statement and submitting to the database in oneshot
and then COMMIT it, so the context switching will be a lot less. So I was
expecting Method-3 to be the fastest way to insert the rows here, but the
response time shows the same response time for Method-2 and method-3.
Method-1 is the slowest through.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul Foerster 2024-09-19 14:37:41 glibc updarte 2.31 to 2.38
Previous Message veem v 2024-09-19 13:03:04 Re: IO related waits