Quick Links

Re: COPY FROM WHEN condition

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: COPY FROM WHEN condition
Date:	2019-04-02 01:11:26
Message-ID:	20190402011126.jvy5oun2nsdy4pqr@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2019-04-02 14:06:52 +1300, David Rowley wrote:
> On Tue, 2 Apr 2019 at 13:59, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > On 2019-04-02 13:41:57 +1300, David Rowley wrote:
> > > On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > Thanks! I'm not quite clear whether you planning to continue working on
> > > > this, or whether this is a handoff? Either is fine with me, just trying
> > > > to avoid unnecessary work / delay.
> > >
> > > I can, if you've not. I was hoping to gauge if you thought the
> > > approach was worth pursuing.
> >
> > I think it's worth pursuing, with the caveats below. I'm going to focus
> > on docs the not-very-long rest of today, but I definitely could work on
> > this afterwards. But I also would welcome any help. Let me know...
>
> I'm looking now. I'll post something when I get it into some better
> shape than it us now.

Cool.

> > > > It still seems wrong to me to just perform a second hashtable search
> > > > here, givent that we've already done the partition dispatch.
> > >
> > > The reason I thought this was a good idea is that if we use the
> > > ResultRelInfo to buffer the tuples then there's no end to how many
> > > tuple slots can exist as the code in copy.c has no control over how
> > > many ResultRelInfos are created.
> >
> > To me those aren't contradictory - we're going to have a ResultRelInfo
> > for each partition either way, but there's nothing preventing copy.c
> > from cleaning up subsidiary data in it. What I was thinking is that
> > we'd just keep track of a list of ResultRelInfos with bulk insert slots,
> > and occasionally clean them up. That way we avoid the secondary lookup,
> > while also managing the amount of slots.
>
> The problem that I see with that is you can't just add to that list
> when the partition changes. You must check if the ResultRelInfo is
> already in the list or not since we could change partitions and change
> back again. For a list with just a few elements checking
> list_member_ptr should be pretty cheap, but I randomly did choose that
> we try to keep just the last 16 partitions worth of buffers. I don't
> think checking list_member_ptr in a 16 element list is likely to be
> faster than a hash table lookup, do you?

Why do we need that list membership check? If you append the
ResultRelInfo to the list when creating the ResultRelInfo's slots array,
you don't need to touch the list after a partition lookup - you know
it's a member if the ResultRelInfo has a slot array. Then you only need
to iterate the list when you want to drop slots to avoid using too much
memory - and that's also a sequential scan of the hash table in your
approach, right?

Greetings,

Andres Freund

In response to

Re: COPY FROM WHEN condition at 2019-04-02 01:06:52 from David Rowley

Responses

Re: COPY FROM WHEN condition at 2019-04-02 17:41:49 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2019-04-02 01:26:32	Re: Ordered Partitioned Table Scans
Previous Message	David Rowley	2019-04-02 01:06:52	Re: COPY FROM WHEN condition