Quick Links

Re: Table with large number of int columns, very slow COPY FROM

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alex Tokarev <dwalin(at)dwalin(dot)ru>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Table with large number of int columns, very slow COPY FROM
Date:	2017-12-08 18:17:34
Message-ID:	20171208181734.db7gk22mtq7i44ax@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Hi,

On 2017-12-07 20:21:45 -0800, Alex Tokarev wrote:
> I have a set of tables with fairly large number of columns, mostly int with
> a few bigints and short char/varchar columns. I¹ve noticed that Postgres is
> pretty slow at inserting data in such a table. I tried to tune every
> possible setting: using unlogged tables, increased shared_buffers, etc; even
> placed the db cluster on ramfs and turned fsync off. The results are pretty
> much the same with the exception of using unlogged tables that improves
> performance just a little bit.

> I have made a minimally reproducible test case consisting of a table with
> 848 columns, inserting partial dataset of 100,000 rows with 240 columns. On
> my dev VM the COPY FROM operation takes just shy of 3 seconds to complete,
> which is entirely unexpected for such a small dataset.

I don't find this to be this absurdly slow. On my laptop loading with a
development checkout this takes 1223.950 ms. This is 20mio fields
parsed/sec, rows with 69mio fields/sec inserted. Removing the TRUNCATE
and running the COPYs concurrently scales well to a few clients, and
only stops because my laptop's SSD stops being able to keep up.

That said, I do think there's a few places that could stand some
improvement. Locally the profile shows up as:
+ 15.38% postgres libc-2.25.so [.] __GI_____strtoll_l_internal
+ 11.79% postgres postgres [.] heap_fill_tuple
+ 8.00% postgres postgres [.] CopyFrom
+ 7.40% postgres postgres [.] CopyReadLine
+ 6.79% postgres postgres [.] ExecConstraints
+ 6.68% postgres postgres [.] NextCopyFromRawFields
+ 6.36% postgres postgres [.] heap_compute_data_size
+ 6.02% postgres postgres [.] pg_atoi

the strtoll is libc functionality triggered by pg_atoi(), something I've
seen show up in numerous profiles. I think it's probably time to have
our own optimized version of it rather than relying on libcs.

That heap_fill_tuple(), which basically builds a tuple from the parsed
datums, takes time somewhat proportional to the number of columns in the
table seems hard to avoid, especially because this isn't something we
want to optimize for with the price of making more common workloads with
fewer columns slower. But there seems quite some micro-optimization
potential.

That ExecConstraints() shows up seems unsurprising, it has to walk
through all the table's columns checking for constraints. We could
easily optimize this so we have a separate datastructure listing
constraints, but that'd be slower in the very common case of more
reasonable numbers of columns.

The copy implementation deserves some optimization too...

> Here¹s a tarball with test schema and data:
> http://nohuhu.org/copy_perf.tar.bz2; it¹s 338k compressed but expands to
> ~50mb. Here¹s the result of profiling session with perf:
> https://pastebin.com/pjv7JqxD

Thanks!

Greetings,

Andres Freund

In response to

Table with large number of int columns, very slow COPY FROM at 2017-12-08 04:21:45 from Alex Tokarev

Responses

Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM) at 2017-12-08 21:44:37 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2017-12-08 18:46:45	Re: no partition pruning when partitioning using array type
Previous Message	Robert Haas	2017-12-08 17:42:49	Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andres Freund	2017-12-08 21:44:37	Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
Previous Message	Gustavo Velasquez	2017-12-08 16:44:50	Re: Learning EXPLAIN