Quick Links

Re: polyphase merge?

From:	Don Marvick <donmarvick(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: polyphase merge?
Date:	2009-02-04 06:42:48
Message-ID:	d18e24870902032242v4d868296o39774f928ad76ff0@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dear All,

Since nobody replied, I would give it a try. I am going to implement the
merge pattern described in Knuth Page 365 (5.4.9), essentially it is as
follow:
- create initial runs using replacement selection (basically this is as in
the current implementation)
- add enough dummy runs of size 0 so that the number of sorted run minus one
can be divided by k-1 (k is merge fan in)
- repetitively merge k smallest runs until only 1 run left

I am new to postgresql, hence any advice would be appreciated.

Can anybody give me some advice on how it can done?
1. how a run should be represented? should I reuse the tape mechanism? e.g.
1 tape 1 run
- or should I use a temp buffer?

2. How memory should be allocated? I assume I divide the memory equally to k
runs, hence each run will get M/k memory. Each read of a run would be of
size M/k bytes.

3. Prefetching. Then, we can precompute the read sequence of blocks of run
during the entire merge process. Based on this, we know the blocks of run to
be prefetched at a point of time.

3. Is it possible to perform read/write I/O while the merge is being
performed? Hence we overlap I/O and computation.

4. any other issue needs consideration?

Best regards,

Don

On Thu, Jan 29, 2009 at 4:11 PM, Don Marvick <donmarvick(at)gmail(dot)com> wrote:

> Dear All,
>
> I apologize if this has been discussed before. I have tried to search to
> the mailing list and could not find an exact answer.
>
> Currently, Postgres uses Knuth's Algorithm 5.4.2D to perform external merge
> sort. IMHO the algorithm is designed for tapes.
>
> Why don't the simple text book merge pattern described in Knuth Page 365
> (5.4.9) is used for disk based system? The same merge pattern is also
> described in Ramakrishnan text book and it is optimal if seek time is not
> counted, which of course not the real-world case.
>
> Best regards,
>
> Don
>
>

In response to

polyphase merge? at 2009-01-29 08:11:02 from Don Marvick

Responses

Re: polyphase merge? at 2009-02-04 10:09:57 from Simon Riggs
Re: polyphase merge? at 2009-02-04 15:20:46 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	K, Niranjan (NSN - IN/Bangalore)	2009-02-04 07:17:24	Re: Synch Replication
Previous Message	Fujii Masao	2009-02-04 05:19:05	Re: Hot standby, recovery infra