Re: Parallel threads in query

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel threads in query
Date: 2018-11-01 19:17:56
Message-ID: CAC8Q8tKRMRTBSDqaD5NEsm7HtAX2F7B0YJsZOQt1pFiF8nzOPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> Because you said "faster than reasonable IPC" - which to me implies that
> you don't do full blown IPC. Which using threads in a bgworker is very
> strongly implying. What you're proposing strongly implies multiple
> context switches just to process a few results. Even before, but
> especially after, spectre that's an expensive proposition.
>
>
To have some idea of what it could be:

a)
PostGIS has ST_ClusterKMeans window function. It collects all geometries
passed to it to memory, re-packs to more compact buffer and starts a loop
that goes over it several (let's say 10..100) times. Then it spits out all
the assigned cluster numbers for each of the input rows.

It's all great when you need to calculate KMeans of 200-50000 rows, but for
a million input rows even a hundred passes on a single core are painful.

b)
PostGIS has ST_Subdivide function. It takes a single row of geometry
(usually super-large, like a continent or the wholeness of Russia) and
splits it into many rows that have more simple shape, by performing a
horizontal or vertical split recursively. Since it's a tree traversal, it
can be paralleled efficiently, with one task becoming to follow the right
subpart of geometry and other - to follow left part of it.

Both seem to be a standard thing for OpenMP, which has compiler support in
GCC and clang and MSVC. For an overview how it work, have a look here:
https://web.archive.org/web/20180828151435/https://bisqwit.iki.fi/story/howto/openmp/

So, do I understand correctly that I need to start a parallel worker that
does nothing for each thread I launch to consume the parallel worker limit?
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-11-01 19:24:25 Re: Parallel threads in query
Previous Message Tomas Vondra 2018-11-01 19:11:19 Re: Doubts about pushing LIMIT to MergeAppendPath