Re: ALTER TABLE ADD COLUMN fast default

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-02-28 22:49:43
Message-ID: CAA8=A782mqgxq1u6fbcUYH+bv+Yjf=JjoH57+8-VdjbZLfRzwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote:
>> This was debated quite some time ago. See for example
>> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us>
>> The consensus then seemed to be to store them in pg_attribute. If not,
>> I think we'd need a new catalog - pg_attrdef doesn't seem right. They
>> would have to go on separate rows, and we'd be storing values rather
>> than expressions, so it would get somewhat ugly.
>
> I know that I'm concerned with storing a number of large values in
> pg_attributes, and I haven't seen that argument addressed much in a
> quick scan of the discussion.
>
>

I'd like to have a consensus on this before I start making such a
change as adding a new catalog table.

>> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
>> >> * ----------------
>> >> */
>> >> bool
>> >> -heap_attisnull(HeapTuple tup, int attnum)
>> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
>> >> {
>> >> + /*
>> >> + * We allow a NULL tupledesc for relations not expected to have
>> >> + * missing values, such as catalog relations and indexes.
>> >> + */
>> >> + Assert(!tupleDesc || attnum <= tupleDesc->natts);
>> >
>> > This seems fairly likely to hide bugs.
>
>> How? There are plenty of cases where we simply expect quite reasonably
>> that there won't be any missing attributes, and we don't need a
>> tupleDesc at all in such cases.
>
> Missing to pass a tupledesc for tables that can have defaults with not
> have directly visible issues, you'll just erroneously get back a null.
>
>

What would you prefer? Two versions of the function?

>> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
>> >> return false; /* out of order */
>> >> if (att_tup->attisdropped)
>> >> return false; /* table contains dropped columns */
>> >> + if (att_tup->atthasmissing)
>> >> + return false; /* table contains cols with missing values */
>> >
>> > Hm, so incrementally building a table definitions with defaults, even if
>> > there aren't ever any rows, or if there's a subsequent table rewrite,
>> > will cause slowdowns. If so, shouldn't a rewrite remove all
>> > atthasmissing values?
>
>> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having
>> had to make this change. Still, it looks like dropping a column has
>> the same effect.
>
> What exactly is mystifying you? That the new default stuff doesn't work
> when the physical tlist optimization is in use?
>

I think I have a better handle on that now. I'm trying to think of a
way around it, no massive inspiration yet.

>
>> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
>> >> MemoryContextAllocZero(CacheMemoryContext,
>> >> relation->rd_rel->relnatts *
>> >> sizeof(AttrDefault));
>> >> - attrdef[ndef].adnum = attp->attnum;
>> >> + attrdef[ndef].adnum = attnum;
>> >> attrdef[ndef].adbin = NULL;
>> >> +
>> >> ndef++;
>> >> }
>> >> +
>> >> + /* Likewise for a missing value */
>> >> + if (attp->atthasmissing)
>> >> + {
>> >
>> > As I've written somewhere above, I don't think it's a good idea to do
>> > this preemptively. Tupledescs are very commonly copied, and the
>> > missing attributes are quite likely never used. IOW, we should just
>> > remember the attmissingattr value and fill in the corresponding
>> > attmissingval on-demand.
>
>> Defaults are frequently not used either, yet this function fills those
>> in regardless. I don't see why we need to treat missing values
>> differently.
>
> It's already hurting us quite a bit in workloads with a lot of trivial
> queries. Adding more makes it worse.
>
>

Please expand on your view of how we should "just remember the
attmissingattr value and fill in the corresponding attmissingval
on-demand."

I don't mind changing how we do things, but I need a bit more detail than this.

If copying defaults around has been hurting us so much it seems
surprising nobody has put forward an improvement.

>> Attached is the current state of things, with the fixes indicated
>> above, as well as some things pointed out by David Rowley.
>>
>> None of this seems to have had much effect on performance.
>>
>> Here's what oprofile reports from a run of pgbench with
>> create-alter.sql and the query "select sum(c1000) from t":
>>
>> Profiling through timer interrupt
>> samples % image name symbol name
>> 22584 28.5982 postgres ExecInterpExpr
>> 11950 15.1323 postgres slot_getmissingattrs
>> 5471 6.9279 postgres AllocSetAlloc
>> 3018 3.8217 postgres TupleDescInitEntry
>> 2042 2.5858 postgres MemoryContextAllocZeroAligned
>> 1807 2.2882 postgres SearchCatCache1
>> 1638 2.0742 postgres expression_tree_walker
>>
>>
>> That's very different from what we see on the master branch. The time
>> spent in slot_getmissingattrs is perhaps not unexpected, but I don't
>> (at least yet) understand why we're getting so much time spent in
>> ExecInterpExpr, which doesn't show up at all when the same benchmark
>> is run on master.
>
> I'd guess that's because the physical tlist optimization is disabled. I
> assume you'd see something similar on master if you dropped a column.
>

Yeah. That's kinda ugly in fact.

Here's a version of the patch which cleans up the missing attribute
settings on a table rewrite.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
fast_default-v13.patch application/octet-stream 104.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-02-28 22:54:43 VPATH build from a tarball fails with some gmake versions
Previous Message Peter Eisentraut 2018-02-28 22:37:11 Re: prokind column (was Re: [HACKERS] SQL procedures)