From: | Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PgHacker <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PG-Strom - A GPU optimized asynchronous executor module |
Date: | 2012-01-23 06:38:54 |
Message-ID: | CADyhKSWA3nSbokM2TFGzNB1rKYubfws1ZJzFpNjP5Kue1EdU-g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2012/1/23 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sun, Jan 22, 2012 at 10:48 AM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:
>> I tried to implement a fdw module that is designed to utilize GPU
>> devices to execute
>> qualifiers of sequential-scan on foreign tables managed by this module.
>>
>> It was named PG-Strom, and the following wikipage gives a brief
>> overview of this module.
>> http://wiki.postgresql.org/wiki/PGStrom
>>
>> In our measurement, it achieves about x10 times faster on
>> sequential-scan with complex-
>> qualifiers, of course, it quite depends on type of workloads.
>
> That's pretty neat. In terms of tuning the non-GPU based
> implementation, have you done any profiling? Sometimes that leads to
> an "oh, woops" moment.
>
Not yet, except for \timing.
What options are available to see rate of workloads of components
within a particular query?
I tried to google some keywords, but does not hit to me.
As an aside, I also tries to modify is_device_executable_qual() always
return false to disable qualifiers pushed-down.
In this case, 2100ms of 7679ms was consumed within this module, thus,
I guess rest of 5500ms was mostly consumed by ExecQual(), although
it is just an estimation...
postgres=# SET pg_strom.exec_profile = on;
SET
Time: 1.075 ms
postgres=# SELECT count(*) FROM ftbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
INFO: PG-Strom Exec Profile on "ftbl"
INFO: Total PG-Strom consumed time: 2100.898 ms
INFO: Time to JIT Compile GPU code: 0.000 ms
INFO: Time to initialize devices: 0.000 ms
INFO: Time to Load column-stores: 7.013 ms
INFO: Time to Scan column-stores: 1219.746 ms
INFO: Time to Fetch virtual tuples: 874.095 ms
INFO: Time of GPU Synchronization: 0.000 ms
INFO: Time of Async memcpy: 0.000 ms
INFO: Time of Async kernel exec: 0.000 ms
count
-------
3159
(1 row)
Time: 7679.342 ms
Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2012-01-23 07:58:17 | Re: New replication mode: write |
Previous Message | Tom Lane | 2012-01-23 06:00:40 | Re: Inline Extension |