From: | Tobias Völk <tobias(dot)voelk(at)t-online(dot)de> |
---|---|
To: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Improvement for query planner? (no, not about count(*) again ;-)) |
Date: | 2020-07-19 19:22:03 |
Message-ID: | 1jxEsu-0bGx160@fwd35.t-online.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-general |
Hello Postgres-Community,
I’ve got a table games(name1 text, name2 text) with 1.3x10^9 rows consisting of two two text columns for the names of players who’ve played a game, duplicate rows are possible, there’s no primary key since this table was just intended as a temporary storage for my data until further processing.
The length of a name is usually not more than 20 characters, shorter most of the time.
I’ve asked postgres to make an unlogged newtable(name text primary key) consisting of the unqiue names and executed:
Insert into newtable(name) select name1 from games on conflict do nothing;
(and later on intended to do the same for the second column)
However after hours it still wasn’t done, used only 1 cpu core to the max and read with 5 MB/s from my fast SSD.
So I stopped it.
I’ve also tried inserting (select name1 from games union select name2 from games) but it always wanted to do it using sorting.
But either the sorting or the preperations for the sorting were again only done using 1 core to the max and reading with 5 MB/s.
Couldn’t find a fast query for my problem.
So I wrote a java-program which read the whole table at a fetchsize of about 4 million and inserted the names into a HashSet.
And surprisingly after only a few minutes the program was already 25% done o.O
My question is, why isn’t postgres nearly this fast? Why doesn’t it just create a HashSet in RAM and read full speed from the disk?
I even created a hash index but it kept using it’s primary key b-tree and then I read that hash indices somehow don’t support checking for uniqueness.
Best regards, Tobi
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2020-07-19 23:32:28 | Re: psql has some accessibility issues on Windows |
Previous Message | Andy Fan | 2020-07-19 02:25:55 | Re: Reported type mismatch improperly |
From | Date | Subject | |
---|---|---|---|
Next Message | Thorsten Schöning | 2020-07-19 19:51:12 | Re: How to restore a dump containing CASTs into a database with a new user? |
Previous Message | David G. Johnston | 2020-07-19 19:01:32 | Re: Re: PG 9.5.5 cores on AIX 7.1 |