From: | AI Rumman <rummandba(at)gmail(dot)com> |
---|---|
To: | pgsql-general General <pgsql-general(at)postgresql(dot)org> |
Subject: | Query Parallelizing with dblink |
Date: | 2013-01-30 15:27:07 |
Message-ID: | CAGoODpd6rZSC0LAYeAPEPaO+rR6HGHfpgHLR327oQahaBQDJtw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi All,
I made an idea to utilize multiple core for Postgresql query. I used dblink
to achieve this task.
You may find it:
http://www.rummandba.com/2013/01/postgresql-query-parallelizing-with.html
It is just a basic idea and completely usable for my data model. But I
share it with you all so that you may add some more good ideas here.
------ Added the document here
------------------------------------------------------------------
I have three tables with almost 10 millions of records in each. The tables
are:
customer
account
tickets
"customer" table holds record for all kind of customers which are related
to account or tickets.
We need to generate a report with of customer and its related accounts or
tickets.
The basic sql is like:
select *
from
(
select c.custid,
case when a.accountid is not null then
a.accountid
when t.ticketid is not null then
t.ticketid
end
as relatedid
from customer as c
left join account as a on c.custid = a.accountid and type ='Accounts'
left join tickets as t on c.custid = t.ticketid and type ='HelpDesk'
where c.deleted = 0
) as q
where relatedid is not null
I have all the required indexes. But the query was taking too much time.
One of the bottleneck that I always feel with Postgresql is its lack of
query parallelism technique. Good news is that, the great developers are
working on it.
However, I have to improve the query performance at this moment. So I make
a plan to divide the query in two parts and then execute each part
asynchronously and then collect the result.
To achieve this, I make the function qry_parallel_fn. This function create
two separate dblink connection conn1 and conn2 and execute two different
queries in async mode.
There is a while loop which checks if both the connections have completed
the task or not. If yes, then the function return results.
CREATE OR REPLACE FUNCTION qry_parallel_fn() RETURNS SETOF RECORD AS $$
DECLARE
v_qry1 text;
v_qry2 text;
cur1 cursor is
select *
from dblink_get_result('conn1') as t1(custid int, relatedid int);
cur2 cursor is
select *
from dblink_get_result('conn2') as t1(custid int, relatedid int);
v_closed smallint;
BEGIN
v_qry1 := 'select custid, accountid as relatedid from customer c inner
join account a on c.custid = a.accountid where c.deleted = 0';
RAISE NOTICE 'vqry1 = %' , v_qry1;
v_qry2 := 'select custid, ticketid as relatedid from customer c inner
join tickets as t on c.custid = t.ticketid where c.deleted = 0';
PERFORM dblink_connect('conn1','dbname=rumman');
PERFORM dblink_connect('conn2','dbname=rumman');
PERFORM dblink_send_query('conn1',v_qry1);
PERFORM dblink_send_query('conn2',v_qry2);
v_closed := 0;
WHILE v_closed <> 2 loop
if check_conn_is_busy('conn1') = 0 then
v_closed := v_closed + 1;
end if;
if check_conn_is_busy('conn2') = 0 then
v_closed := v_closed + 1;
end if;
END LOOP;
FOR rec IN cur1
LOOP
RETURN NEXT rec;
END LOOP;
FOR rec IN cur2
LOOP
RETURN NEXT rec;
END LOOP;
PERFORM dblink_disconnect('conn1');
PERFORM dblink_disconnect('conn2');
RETURN;
END;
$$
language 'plpgsql'
--Execute
--select * from test_fn() as t1(c int, d int);
-- select count(*) from test_fn() as t1(c int, d int);
CREATE OR REPLACE FUNCTION check_conn_is_busy(conn text) RETURNS INT AS $$
DECLARE
v int;
BEGIN
SELECT dblink_is_busy(conn) INTO v;
RETURN v;
END;
$$
language 'plpgsql'
I was monitoring the server performance and found that it was using two cpu
cores to get the result and improve the query timing a bit.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Let me know your thoughts on it.
Thanks.
From | Date | Subject | |
---|---|---|---|
Next Message | Brian Janes | 2013-01-30 16:22:28 | Cannot start PG as a Windows Service on Server 2008 and Windows 8 |
Previous Message | Jeff Janes | 2013-01-30 15:18:17 | Re: Optimizing select count query which often takes over 10 seconds |