From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
Cc: | John Gorman <johngorman2(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Seq Scan |
Date: | 2015-01-15 05:55:08 |
Message-ID: | CAA4eK1Ju_Rn4j=kwgwY4vbfEdSfNTZCGqxUhFdkWF0JXm2pt3w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 14, 2015 at 9:30 AM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>
> On Wed, Jan 14, 2015 at 9:12 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
>>
>> Here we have to decide what should be the strategy and how much
>> each worker should scan. As an example one of the the strategy
>> could be if the table size is X MB and there are 8 workers, then
>> divide the work as X/8 MB for each worker (which I have currently
>> used in patch) and another could be each worker does scan
>> 1 block at a time and then check some global structure to see which
>> next block it needs to scan, according to me this could lead to random
>> scan. I have read that some other databases also divide the work
>> based on partitions or segments (size of segment is not very clear).
>
>
> A block can contain useful tuples, i.e tuples which are visible and
fulfil the quals + useless tuples i.e. tuples which are dead, invisible or
that do not fulfil the quals. Depending upon the contents of these blocks,
esp. the ratio of (useful tuples)/(unuseful tuples), even though we divide
the relation into equal sized runs, each worker may take different time.
So, instead of dividing the relation into number of run = number of
workers, it might be better to divide them into fixed sized runs with size
< (total number of blocks/ number of workers), and let a worker pick up a
run after it finishes with the previous one. The smaller the size of runs
the better load balancing but higher cost of starting with the run. So, we
have to strike a balance.
>
I think your suggestion is good and it somewhat falls inline
with what Robert has suggested, but instead of block-by-block,
you seem to be suggesting of doing it in chunks (where chunk size
is not clear), however the only point against this is that such a
strategy for parallel sequence scan could lead to random scans
which can hurt the operation badly. Nonetheless, I will think more
on this lines of making work distribution dynamic so that we can
ensure that all workers can be kept busy.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2015-01-15 06:04:28 | Re: orangutan seizes up during isolation-check |
Previous Message | Michael Paquier | 2015-01-15 05:53:31 | Re: Table-level log_autovacuum_min_duration |