Visibility Map Issues

From: Jeremy Smith <jeremy(at)musicsmith(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Visibility Map Issues
Date: 2019-10-18 12:53:57
Message-ID: CAM8SmLX=DJYe64d+ZZvjO5=obys41RGU=Yo+5wX+216ANZhHAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello all,
We have a data warehouse (postgres 11.5, on centos 7) that contains
many instances of the following structure:

- table_a
- table_b
- a view that selects everything from either table_a or table_b

All external queries select from the view, so we can rebuild the table that
isn't exposed by the view and then switch the view to point at the most
recent table.

We have a procedure that will drop indexes from the non-visible table,
truncate it, commit, repopulate it with data (using oracle_fdw), commit,
build the indexes, commit, run VACUUM ANALYZE (using pg_background), and
then re-write the view to point at the new table (and a final commit).

The issue I am dealing with is that on occasion the VACUUM step does not
update the visibility map (as evidenced by pg_class.relallvisible = 0),
which means the planner won't use index-only scans. I can't figure out the
reason for this. I have tried the following:
- change VACUUM ANALYZE table_name to VACUUM (ANALYZE,
DISABLE_PAGE_SKIPPING) table_name
- Get xmin of a row in the table (all rows should be the same) and compare
to txid_snapshot_xmin(txid_current_snapshot()) before vacuuming. I used
raise notice to log these values and I never saw a conflict. For example,
before running a vacuum I logged xmin = 207781 and the txid_snapshot_xmin
was 207785. After running VACUUM ANALYZE, however, relallvisible was set
to 0.
- Running oracle_close_connections() after loading data (before indexing
and vacuuming) just in case there was an issue with open Oracle connections
- Setting old_snapshot_threshold to 60s and waiting (pg_sleep) for 60s,
65s, 120s, or 125s...

My questions:
- Is there some way of knowing ahead of time that a VACUUM will actually
set the visibility of all pages? I would expect the visibility map to not
be updated if there are older transactions open, but shouldn't I be able to
see that by examining the snapshot?
- Is there an issue running all of this in a procedure? Since I'm
repeating the same thing with small variations for many tables, I was
hoping to keep this in a procedure, but if there are issues with how the
transactions might interact with pg_background/VACUUM, I may have to
abandon this approach.

There are no standby servers and vacuum_defer_cleanup_age is set to 0,.

I tried to create a minimum reproducible example:
CREATE TABLE test_a (a int);
CREATE index ix_test_a on test_a(a);

CREATE OR REPLACE PROCEDURE test_switch() AS $$
DECLARE
visible_pages int;
BEGIN
DROP INDEX IF EXISTS ix_test_a;
TRUNCATE TABLE test_a;
COMMIT;

INSERT INTO test_a SELECT generate_series(1,10000);
COMMIT;

CREATE INDEX ix_test_a ON test_a (a);
COMMIT;

RAISE NOTICE 'xmin:%', (SELECT xmin from test_a limit 1);
RAISE NOTICE 'snapshot min:%',
(txid_snapshot_xmin(txid_current_snapshot()));
PERFORM * FROM pg_background_result(pg_background_launch('VACUUM
(ANALYZE,DISABLE_PAGE_SKIPPING,VERBOSE) test_a')) as result(a text);
SELECT relallvisible FROM pg_class WHERE relname = 'test_a' INTO
visible_pages;
IF visible_pages = 0 THEN
RAISE EXCEPTION 'NO VISIBLE PAGES';
ELSE
RAISE NOTICE 'relallvisible:%', visible_pages;
END IF;
END;

$$
LANGUAGE PLPGSQL;

I ran this repeatedly, using \watch 1 in psql. At the same time, I ran
three other sessions running BEGIN; SELECT 1; COMMIT; every 1s. I got a
failure in test_switch after about one minute:

NOTICE: xmin:1949
NOTICE: snapshot min:1951
INFO: aggressively vacuuming "public.test_a"
INFO: index "ix_test_a" now contains 10000 row versions in 30 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "test_a": found 0 removable, 10000 nonremovable row versions in 45
out of 45 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1948
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: analyzing "public.test_a"
INFO: "test_a": scanned 45 of 45 pages, containing 10000 live rows and 0
dead rows; 10000 rows in sample, 10000 estimated total rows
ERROR: NO VISIBLE PAGES

I don't understand why the snapshot min says 1951, but the VACUUM output
says oldest xmin: 1948.

Thanks,
Jeremy

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-10-18 13:01:58 Re: DBD::Pg exorts char columns with trailing blanks
Previous Message Patrick FICHE 2019-10-18 12:34:56 RE: Execute a function through fdw