From: | tim(dot)child(at)comcast(dot)net |
---|---|
To: | Shmagi Kavtaradze <kavtaradze(dot)s(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org |
Subject: | Re: Parallel Execution of Query |
Date: | 2015-11-30 21:39:15 |
Message-ID: | 88192495.4184600.1448919555445.JavaMail.zimbra@comcast.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Shmagi,
First, I would explore creating multiple VM's and trying to run the query in parallel VM's. If you can easily clone your VMs, try creating two VM and running 1/2 the query on each VM.
Then try 4 VM's, then 8 and so on.
An complex approach for a single VM is to write C UDF (User Defined Function). The UDF should do the following
1) Take a select query a the input argument
2) Run the query and store the results in a C collection (a list or array of C structs)
3) Loop over the C collection N by N times computing the similarity matching (cosine, euclidean)
4) Output the result as a set of rows
This is a non-trivial approach, as it requires deep knowledge of PostgreSQL C functions. But it could
speed up calculations like this by orders of magnitude
Regards
Tim
----- Original Message -----
From: "Shmagi Kavtaradze" <kavtaradze(dot)s(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Sent: Monday, November 30, 2015 9:00:40 AM
Subject: [NOVICE] Parallel Execution of Query
I am doing similarity matching (cosine, euclidean). If I have 4000 entries in a table, the number of comparisons will be 16M. I am running postgres on a virtual machine, so it takes 20-25 minutes to run the query or the system crashes. Can I run the query in parallel? I heard there are tools like PL/Proxy and pgpool, can I use them to create several databases on the same machine and run query in parallel?
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Beyne | 2015-11-30 23:45:38 | Last Chance to Defend Your Freedom |
Previous Message | James Keener | 2015-11-30 21:19:46 | Re: Parallel Execution of Query |