From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Load TIME fields - proposed performance improvement |
Date: | 2020-09-22 01:42:55 |
Message-ID: | CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers.
I have a test table with multiple (10) columns defined as TIME WITHOUT
TIME ZONE.
When loading this table with a lot of data (e.g. "COPY tbl FROM
/my/path/2GB.csv WITH (FORMAT CSV)") I observed it was spending an
excessive amount of time within the function GetCurrentDateTime.
IIUC the code is calling GetCurrentDateTime only to acquire the
current TX timestamp as a struct pg_tm in order to derive some
timezone information.
My test table has 10 x TIME columns.
My test data has 22.5 million rows (~ 2GB)
So that's 225 million times the GetCurrentDateTime function is called
to populate the struct with the same values.
I have attached a patch which caches this struct, so now those 225
million calls are reduced to just 1 call.
~
Test Results:
Copy 22.5 million rows data (~ 2GB)
BEFORE
Run 1 = 4m 36s
Run 2 = 4m 30s
Run 3 = 4m 32s
perf showed 20.95% time in GetCurrentDateTime
AFTER (cached struct)
Run 1 = 3m 44s
Run 2 = 3m 44s
Run 3 = 3m 45s
perf shows no time in GetCurrentDateTime
~17% performance improvement in my environment. YMMV.
~
Thoughts?
Kind Regards
Peter Smith.
Fujitsu Australia.
Attachment | Content-Type | Size |
---|---|---|
DecodeTimeOnly_GetCurrentDateTime.patch | application/octet-stream | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-09-22 02:12:45 | Re: Load TIME fields - proposed performance improvement |
Previous Message | tsunakawa.takay@fujitsu.com | 2020-09-22 01:17:38 | RE: Transactions involving multiple postgres foreign servers, take 2 |