Re: PG 9.6.20 -- query misbehaves in replica

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Ernesto Hernández-Novich <emhnemhn(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: PG 9.6.20 -- query misbehaves in replica
Date: 2020-11-13 23:59:01
Message-ID: CABUevEzyDgu=1EPe7Sf8gzK2i0ZexLSziBRMQjeO0vXP2b7vaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Nov 14, 2020 at 12:17 AM Ernesto Hernández-Novich <
emhnemhn(at)gmail(dot)com> wrote:

> Hello,
>
> We have a PG 9.6.19/9.6.20 streaming replication set using PGDG
> packages over Debian 9 and 10. We started experiencing an interesting
> issue yesterday. This is the layout; all arrows are stream replication
> built using `pg_basebackup` over replication slots.
>
> Master N0 (9.6.19 / Debian 9) -> Replica A (9.6.19 / Debian 9)
> | |
> | +-----------------> Replica B (9.6.20 / Debian 10)
> |
> v
> Replica N1 (9.6.20 / Debian 9) -> Replica C (9.6.20 / Debian 9)
> |
> +------------------> Replica D (9.6.20 / Debian 10)
>
> We have a query that is a simple `select * from table where pk = 'fixed
> value'`. It fetches a *single* row vía a PK Index Scan using `=` on a
> TEXT value, no implicit conversions. Said query used to work fine
> across the replication set.
>
> After updating from 9.6.19 to 9.6.20, we noticed the query was not
> working on D. It did NOT bring any rows. If the query is attempted
> using LIKE or ~ (with a left-anchored pattern), the correct row is
> fetched BUT using a Sequential Scan. We checked B, same behavior.
> However, the query works normally (single row, PK IDX scan) on N0, A,
> N1, and C.
>
> We've tried:
>
> 1. REINDEX table / VACUUM ANALYZE table -- nothing changed.
>
> 2. REINDEX DATABASE / REINDEX SYSTEM on N0 -- nothing changed.
>
> 3. VACUUM FULL ANALYZE on N0 -- nothing changed.
>
> 3. Re-creating replicas B (and D) from scratch (remove $PGDATA,
> `pg_basebackup`, use replication slot) -- nothing changed.
>
> 4. Creating a NEW replica X (9.6.20 / Debian 10) on a DIFFERENT
> machine. First, using N1 as master, then using N0 as master.
> Nothing changes.
>
> The query NEVER works on replicas B, D, nor the NEW X.
>
> Looks like a the 9.6.20 over Debian 10 is the culprit, but I have
> nothing else to work on.
>
> Suggestions?
>

These issues are almost certainly because of the glibc locale changes
between Debian 9 and Debian 10, and not because of the PostgreSQL upgrade.

If you have master and standby on different glibc versions (so debian 9 vs
10), all text based indexes can behave differently. All the nodes must run
the same version for this to work. So in this scenario, you need to
reinstall B and D with debian 9, or you need to upgrade your other nodes to
debian 10 (and if you do that, then you have to reindex the master node
once it has been upgraded).

See https://wiki.postgresql.org/wiki/Locale_data_changes

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2020-11-14 00:03:00 Re: PG 9.6.20 -- query misbehaves in replica
Previous Message Ernesto Hernández-Novich 2020-11-13 23:14:23 PG 9.6.20 -- query misbehaves in replica