From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Remaining beta blockers |
Date: | 2013-04-30 15:04:28 |
Message-ID: | 20130430150428.GD25261@alap2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-04-30 07:33:05 -0700, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > 1) vacuum can truncate the table to an empty relation already if there is
> > no data returned by the view's query which then leads to the wrong
> > scannability state.
> >
> > S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() < -1;
> > S2: S2: SELECT * FROM matview_empty; -- works
> > S1: VACUUM matview_empty;
> > S2: SELECT * FROM matview_empty; -- still works
> > S3: SELECT * FROM matview_empty; -- errors out
> >
> > So we need to make vacuum skip cleaning out the last page. Once we
> > get incrementally updated matviews there are more situations to get
> > into this than just a query not returning anything.
> > I remember this being discussed somewhere already, but couldn't find
> > it quickly enough.
>
> Yeah, I posted a short patch earlier on this thread that fixes
> that. Nobody has commented on it, and so far I have not committed
> anything related to this without posting details and giving ample
> opportunity for anyone to comment. If nobody objects, I can push
> that, and this issue is gone.
Well, this bug is gone, but the multiple layer violations aren't.
> > Imo this is quite an indicator that the idea of using the filesize is
> > just plain wrong. Adding logic to vacuum not to truncate data because
> > its a flag for matview scannability is quite the layer violation and
> > a sign for a bad design decision. Such a hack has already been added
> > to copy_heap_data(), while not as bad, shows that it is hard to find
> > all the places where we need to add it.
>
> I agree, but if you review the thread I started with a flag in
> pg_class, I tried using the "special" area of the first page of the
> heap, and finally wound up with the current technique based on
> several people reviewing the patch and offering opinions and
> reaching an apparent consensus that this was the least evil way to
> do it given current infrastructure for unlogged tables. This
> really is a result of trying to preserver *correct* behavior while
> catering to those emphasizing how important the performance of
> unlogged matviews is.
Imo it now uses the worst of both worlds...
> > 2) Since we don't have a metapage to represent scannability in 9.3 we
> > cannot easily use one in 9.4+ without pg_upgrade emptying all
> > matviews unless we only rely on the catalogs which we currently
> > cannot.
> I am not following this argument at all. Doesn't pg_upgrade use
> pg_dump to create the tables and matviews WITH NO DATA and take
> later action for ones which are populated in the source? It would
> not be hard at all to move to a new release which used a different
> technique for tracking populated tables by changing what pg_dump
> does for populated tables with the switch pg_upgrade uses.
What I am thinking about is a 100GB materialized view which has been
filled in 9.3 and should now be pg_upgraded into 9.4. If we don't have a
metapage yet and we want to add one we would need to rewrite the whole
100GB which seems like a rather bad idea. Or how are you proposing to
add the metapage? You cannot add it to the end or somesuch.
> I am not seeing this at all. Given my comment above about how this
> works for pg_upgrade and pg_dump, can you explain how this is a
> problem?
Well, that only works if there is a cheap way to add the new notation to
existing matview heaps which I don't see.
> > 3) Using the filesize as a flag will make other stuff like preallocating
> > on-disk data in bigger chunks and related things more complicated.
>
> Why? You don't need to preallocate for a non-populated matview,
> since its heap will be replaced on REFRESH. You would not *want*
> to preallocate a lot of space for something guaranteed to remain at
> zero length until deleted.
But we would likely also want to optimize reducing the filesize in the
same chunks because you otherwise would end up with even worse
fragmentation. And I am not talking about an unscannable relation but
about one which is currently empty (e.g. SELECT ... WHERE false).
> > 4) doing the check for scannability in the executor imo is a rather bad
> > idea. Note that we e.g. don't see an error about a matview which
> > won't be scanned because of constraint exclusion or one-time
> > filters.
> >
> > S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false
> > WITH NO DATA;
> > S1: SELECT * FROM matview_unit_false;
> >
> > You can get there in far more reasonable cases than WHERE false.
>
> I am a little concerned about that, but the case you show doesn't demonstrate a problem:
> The view was defined WITH NO DATA and is behaving correctly for
> that case. When populated (with zero rows) it behaves correctly:
Ah. Tom already fixed the problem in
5194024d72f33fb209e10f9ab0ada7cc67df45b7. I was working in a branch
based on 3ccae48f44d993351e1f881761bd6c556ebd6638 and it failed there.
> I would be interested to see whether anyone can come up with an
> actual mis-behavior related to this either before or after the
> recent refactoring.
Seems ok after Tom's refactoring.
> > Not sure what the consequence of this is. The most reasonable solution
> > seems to be to introduce a metapage somewhere *now* which sucks, but ...
>
> That seems far riskier to me than using the current
> lame-but-functional approach now and improving it in a subsequent
> release.
Why? We have bitten by the lack of such metapages several times now and
I don't think we have bitten by their presence in relation types that
have them by now.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-04-30 15:18:16 | Re: The missing pg_get_*def functions |
Previous Message | Greg Stark | 2013-04-30 15:03:25 | Re: Back branches vs. gcc 4.8.0 |