From: | Sebastian Dressler <sebastian(at)swarm64(dot)com> |
---|---|
To: | Henrique Montenegro <typoon(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Sudden insert performance degradation |
Date: | 2020-07-13 16:50:55 |
Message-ID: | FBCA95CB-C1C9-4493-91A1-E9815D6323C1@swarm64.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Henrique,
On 13. Jul 2020, at 18:42, Henrique Montenegro <typoon(at)gmail(dot)com<mailto:typoon(at)gmail(dot)com>> wrote:
On Mon, Jul 13, 2020 at 11:20 AM Sebastian Dressler <sebastian(at)swarm64(dot)com<mailto:sebastian(at)swarm64(dot)com>> wrote:
Running the above loop worked fine for about 12 hours. Each file was taking
about 30 seconds to be processed. About 4 seconds to create the `users` table
and have the CSV data loaded into it and anything between 20 and 30 seconds to
insert the data from `users` into `users_no_dups`.
Do you see anything suspicious in the logs, i.e. something in the realms of running out of transaction IDs?
I set the log to debug1. I haven't seen anything that called my attention, but I am not really sure what to look for, so perhaps I missed it. Any suggestions on what to look for or any specific log configuration to do?
Not necessarily, if you'd run out of tx IDs you would notice that cleary, I guess. I also think that this is not the issue.
[...]
Recreating the table now isn't really providing any improvements. I tried
recreating it with a `fillfactor` of `10`, but it was taking too long and too
much space (the table had 300GB with the fillfactor set to 30; with it set to
10 it went up to almost 1TB).
To me it sounds like the UK/PK is getting too much to write. A possible solution could be to start partitioning the table.
I thought about partitioning it, but I can't figure out on what. The `user_id` column is a number that is somewhat random so I don't know what kinds of range I would use for it. I will try to look at the values again and see if there is something that I could perhaps use as a range. Any other suggestions?
Depending on granularity, maybe partition on `created_ts`?
Cheers,
Sebastian
--
Sebastian Dressler, Solution Architect
+49 30 994 0496 72 | sebastian(at)swarm64(dot)com<mailto:sebastian(at)swarm64(dot)com>
Swarm64 AS
Parkveien 41 B | 0258 Oslo | Norway
Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787
CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender (Styrets Leder): Dr. Sverre Munck
Swarm64 AS Zweigstelle Hive
Ullsteinstr. 120 | 12109 Berlin | Germany
Registered at Amtsgericht Charlottenburg - HRB 154382 B
From | Date | Subject | |
---|---|---|---|
Next Message | Henrique Montenegro | 2020-07-13 17:02:09 | Re: Sudden insert performance degradation |
Previous Message | Henrique Montenegro | 2020-07-13 16:48:53 | Re: Sudden insert performance degradation |