ArrayLists instead of List (for some things)

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: ArrayLists instead of List (for some things)
Date: 2017-11-02 14:05:40
Message-ID: CAKJS1f_2SnXhPVa6eWjzy2O9A=ocwgd0Cj-LQeWpGtrWqbUSDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Our List implementation internally uses linked lists which are
certainly good for some things, but pretty bad at other things. Linked
lists are pretty bad when you want to fetch the Nth element, as it
means looping ever each previous element until you get to the Nth.
They're also pretty bad for a modern CPU due to their memory access
pattern.

We could get around a few of these problems if Lists were implemented
internally as arrays, however, arrays are pretty bad if we want to
delete an item out the middle as we'd have to shuffle everything over
one spot. Linked lists are much better here... at least they are when
you already have the cell you want to remove... so we can't expect
that we can just rewrite List to use arrays instead.

I've attached a first cut implementation of "AList". I was originally
going to call it "ArrayList", but my fingers were getting tired, so I
change it.

This first cut version replaces nothing in the code base to use
ALists. This email (and due to the timing of it) is more of a marker
that this is being worked on. I know in particular that Andres has
complained at least once about List usages in the executor, which I
agree is not great. Looking at the usage of list_nth() is quite scary!

My plans for this include:

* Make targetlists from createplan onwards use ALists. Every time I
look at build_path_tlist() I gawk at the number of palloc calls that
are going on inside those lappend() calls. We already know how many
items will be in the list, so with AList we could just
alist_premake(list_length(path->pathtarget->exprs)) and we'd never
have to palloc() another element for that list again. We do the same
again in various mutator functions in setrefs.c too!

* list_nth usage in adjust_appendrel_attrs_mutator() to speed up
translation between parent and child Vars

* And also, umm, <mumble> simple_rte_array and
simple_rel_array</mumble>. Well, we'd still need somewhere to store
all the RelOptInfos, but the idea is that it would be rather nice if
we could not expand the entire inheritance hierarchy of a relation,
and it would be rather nice if we could just add the partitions that
we need, rather than add them all and setting dummy paths for the ones
we're not interested in.

Anyway, please don't debate the usages of the new type here. As for
all the above plans, I admit to not having a full handle on them yet.
I wrote the Array List implementation so I could get a better idea on
how possible each of those would be, so, for now, to prevent any
duplicate work, here it is...

(Including in Andres because I know he's mentioned his dislike for
List in the executor)

Comments on the design are welcome, but I was too late to the
commitfest, so there are other priorities. However, if you have a
strong opinion, feel free to voice it.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Basic-implementation-of-array-lists-AList.patch application/octet-stream 13.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-11-02 14:12:23 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Tom Lane 2017-11-02 13:58:42 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple