Quick Links

Re: Parallel copy

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Parallel copy
Date:	2020-02-21 12:54:31
Message-ID:	CANwKhkOu7dWj66gC-N4B5SaLWW7=mLGVbfitquoO7pjtEJRWLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 20 Feb 2020 at 18:43, David Fetter <david(at)fetter(dot)org> wrote:>
> On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
> > I think the wc2 is showing that maybe instead of parallelizing the
> > parsing, we might instead try using a different tokenizer/parser and
> > make the implementation more efficient instead of just throwing more
> > CPUs on it.
>
> That was what I had in mind.
>
> > I don't know if our code is similar to what wc does, maytbe parsing
> > csv is more complicated than what wc does.
>
> CSV parsing differs from wc in that there are more states in the state
> machine, but I don't see anything fundamentally different.

The trouble with a state machine based approach is that the state
transitions form a dependency chain, which means that at best the
processing rate will be 4-5 cycles per byte (L1 latency to fetch the
next state).

I whipped together a quick prototype that uses SIMD and bitmap
manipulations to do the equivalent of CopyReadLineText() in csv mode
including quotes and escape handling, this runs at 0.25-0.5 cycles per
byte.

Regards,
Ants Aasma

Attachment	Content-Type	Size
simdcopy.c	text/x-csrc	3.6 KB

In response to

Re: Parallel copy at 2020-02-20 16:43:26 from David Fetter

Responses

Re: Parallel copy at 2020-02-22 00:28:02 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Juan José Santamaría Flecha	2020-02-21 13:02:40	Re: False failure during repeated windows build.
Previous Message	Etsuro Fujita	2020-02-21 11:06:31	Re: Minor improvement to partition_bounds_copy()