Re: Corrupted Data ?

From: Ioana Danes <ioanadanes(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupted Data ?
Date: 2016-08-08 16:28:02
Message-ID: CAPg0s+593U4tR1YBqa1Kh0G8pU-=jwC2VC8aY6msE97dLbkiNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 08/08/2016 09:11 AM, Ioana Danes wrote:
>
>> Hi,
>>
>> I suspect I am having a case of data corruption. Here are the details:
>>
>> I am running postgres 9.4.8:
>>
>> postgresql94-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
>>
>> on CentOS Linux release 7.2.1511 (Core)
>>
>> This is happening in a production environment but luckily on the
>> reporting database.
>> I have a cluster of 3 databases, db1 and db2 are masters and replicate
>> between each other and also replicate to db3 (db1 <-> db2, db1 -> db3,
>> db2 -> db3).
>> For replication I am using Bucardo.
>>
>
> I would say this is more a question for the Burcardo list:
>
> https://mail.endcrypt.com/mailman/listinfo/bucardo-general
>
> I am just not seeing that replicating two masters on to a single database
> is going to end well.
>

Only one master is active at one time the other one is in stand by that is
a topic for another discussion but in our case that works well.

That was my first assumption, that it is a kind of a race condition or a
bug on replication but I quickly ruled that out because that does not
explain why when I filtered the table by transactionid = 75315815 it shows
one record with transactionid 75315811...

select gameplayid, transactionid, encodedplay from abrazo.matchgameplay
where transactionid in (75315815) order by transactionid;;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | *75315811 *| backup:1,9,1
160019261 | 75315815 | backup:2,0,9

So I don't think it is a replication issue...

>
>
>> The problem I am having is that one record in a table it shows a wrong
>> value for one single field:
>>
>> select gameplayid, transactionid, encodedplay from mytable where
>> transactionid in (75315811, 75315815) order by transactionid;
>>
>> gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>> 160019239 | 75315811 | mix:5,2,7
>> 160019237 | 75315811 | mix:5,4,8
>> 160019235 | 75315811 | mix:6,2,9
>> 160019233 | 75315811 | mix:1,9,8
>> 160019271 | 75315815 | mix:9,0,9
>> 160019269 | 75315815 | mix:9,8,9
>> 160019267 | 75315815 | mix:9,2,2
>> 160019265 | 75315815 | mix:2,2,8
>> 160019263 | *75315811* | backup:1,9,1
>> 160019261 | 75315815 | backup:2,0,9
>>
>> select gameplayid, transactionid, encodedplay from mytable where
>> transactionid in (75315815) order by transactionid;
>>
>> gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>> 160019271 | 75315815 | mix:9,0,9
>> 160019269 | 75315815 | mix:9,8,9
>> 160019267 | 75315815 | mix:9,2,2
>> 160019265 | 75315815 | mix:2,2,8
>> 160019263 | *75315811* | backup:1,9,1
>> 160019261 | 75315815 | backup:2,0,9
>>
>> select gameplayid, transactionid, encodedplay from mytable where
>> transactionid in (75315811) order by transactionid;
>>
>> gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>> 160019239 | 75315811 | mix:5,2,7
>> 160019237 | 75315811 | mix:5,4,8
>> 160019235 | 75315811 | mix:6,2,9
>> 160019233 | 75315811 | mix:1,9,8
>>
>> So the record with gameplayid = 160019263 have a wrong transactionid,
>> 75315811 instead of 75315815.
>> The correct value is 75315815 and that I know because of the following
>> facts:
>> - on db1 and db2 transactionid = 75315815 for gameplayid = 160019263,
>> - this table gets mostly inserts, very rare updates and only on other 2
>> fields not this one.
>> - there is another parent table that shows the number of records in this
>> table which is 4 for transactionid =75315811 and 6 for transactionid =
>> 7531581.
>>
>> This table has an index by transactionid and that index seem correct
>> because the filtering and the ordering are fine (like the filed has the
>> correct value)...
>>
>> What puzzles me is that the value that shows in this field is a real
>> value from another record...
>>
>> I only caught this issue because I have a script that runs in the night
>> that compares the databases ...
>>
>> By now I updated the field with the correct value and everything seem
>> stable.
>>
>> Postgres logs don't have any information about file corruption or any
>> other kind of error. I also checked other logs on the system and I could
>> not find any traces of corruption.
>>
>> select name, setting from pg_settings order by 1;
>> name
>> | setting
>> -------------------------------------+----------------------
>> ------------------------------------------------------------
>> -------------------------------
>> allow_system_table_mods | off
>> application_name | psql
>> archive_command | test -f
>> /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=60 -atz
>> %p stldrdb:/data01/wal_files/%f
>> archive_mode | on
>> archive_timeout | 60
>> array_nulls | on
>> authentication_timeout | 60
>> autovacuum | on
>> autovacuum_analyze_scale_factor | 0.1
>> autovacuum_analyze_threshold | 50
>> autovacuum_freeze_max_age | 200000000
>> autovacuum_max_workers | 3
>> autovacuum_multixact_freeze_max_age | 400000000
>> autovacuum_naptime | 60
>> autovacuum_vacuum_cost_delay | 20
>> autovacuum_vacuum_cost_limit | -1
>> autovacuum_vacuum_scale_factor | 0.2
>> autovacuum_vacuum_threshold | 50
>> autovacuum_work_mem | -1
>> backslash_quote | safe_encoding
>> bgwriter_delay | 200
>> bgwriter_lru_maxpages | 100
>> bgwriter_lru_multiplier | 2
>> block_size | 8192
>> bonjour | off
>> bonjour_name |
>> bytea_output | hex
>> check_function_bodies | on
>> checkpoint_completion_target | 0.5
>> checkpoint_segments | 16
>> checkpoint_timeout | 300
>> checkpoint_warning | 30
>> client_encoding | UTF8
>> client_min_messages | error
>> commit_delay | 0
>> commit_siblings | 5
>> config_file | /data01/postgres/postgresql.conf
>> constraint_exclusion | partition
>> cpu_index_tuple_cost | 0.005
>> cpu_operator_cost | 0.0025
>> cpu_tuple_cost | 0.01
>> cursor_tuple_fraction | 0.1
>> data_checksums | off
>> data_directory | /data01/postgres
>> DateStyle | ISO, MDY
>> db_user_namespace | off
>> deadlock_timeout | 1000
>> debug_assertions | off
>> debug_pretty_print | on
>> debug_print_parse | off
>> debug_print_plan | off
>> debug_print_rewritten | off
>> default_statistics_target | 100
>> default_tablespace |
>> default_text_search_config | pg_catalog.english
>> default_transaction_deferrable | off
>> default_transaction_isolation | read committed
>> default_transaction_read_only | off
>> default_with_oids | off
>> dynamic_library_path | $libdir
>> dynamic_shared_memory_type | posix
>> effective_cache_size | 1048576
>> effective_io_concurrency | 1
>> enable_bitmapscan | on
>> enable_hashagg | on
>> enable_hashjoin | on
>> enable_indexonlyscan | on
>> enable_indexscan | on
>> enable_material | on
>> enable_mergejoin | on
>> enable_nestloop | on
>> enable_seqscan | off
>> enable_sort | on
>> enable_tidscan | on
>> escape_string_warning | on
>> event_source | PostgreSQL
>> exit_on_error | off
>> external_pid_file |
>> extra_float_digits | 0
>> from_collapse_limit | 8
>> fsync | on
>> full_page_writes | on
>> geqo | on
>> geqo_effort | 5
>> geqo_generations | 0
>> geqo_pool_size | 0
>> geqo_seed | 0
>> geqo_selection_bias | 2
>> geqo_threshold | 12
>> gin_fuzzy_search_limit | 0
>> hba_file | /data01/postgres/pg_hba.conf
>> hot_standby | off
>> hot_standby_feedback | off
>> huge_pages | try
>> ident_file | /data01/postgres/pg_ident.conf
>> ignore_checksum_failure | off
>> ignore_system_indexes | off
>> integer_datetimes | on
>> IntervalStyle | postgres
>> join_collapse_limit | 8
>> krb_caseins_users | off
>> krb_server_keyfile | FILE:/etc/sysconfig/pgsql/krb5
>> .keytab
>> lc_collate | en_US.UTF-8
>> lc_ctype | en_US.UTF-8
>> lc_messages | en_US.UTF-8
>> lc_monetary | en_US.UTF-8
>> lc_numeric | en_US.UTF-8
>> lc_time | en_US.UTF-8
>> listen_addresses | *
>> local_preload_libraries |
>> lock_timeout | 0
>> lo_compat_privileges | off
>> log_autovacuum_min_duration | -1
>> log_checkpoints | off
>> log_connections | on
>> log_destination | csvlog
>> log_directory | pg_log
>> log_disconnections | on
>> log_duration | off
>> log_error_verbosity | default
>> log_executor_stats | off
>> log_file_mode | 0600
>> log_filename | postgresql-%d.log
>> logging_collector | on
>> log_hostname | off
>> log_line_prefix | %t %d %u
>> log_lock_waits | off
>> log_min_duration_statement | 2000
>> log_min_error_statement | error
>> log_min_messages | error
>> log_parser_stats | off
>> log_planner_stats | off
>> log_rotation_age | 1440
>> log_rotation_size | 0
>> log_statement | none
>> log_statement_stats | off
>> log_temp_files | -1
>> log_timezone | America/St_Lucia
>> log_truncate_on_rotation | on
>> maintenance_work_mem | 131072
>> max_connections | 300
>> max_files_per_process | 1000
>> max_function_args | 100
>> max_identifier_length | 63
>> max_index_keys | 32
>> max_locks_per_transaction | 64
>> max_pred_locks_per_transaction | 64
>> max_prepared_transactions | 0
>> max_replication_slots | 0
>> max_stack_depth | 2048
>> max_standby_archive_delay | 30000
>> max_standby_streaming_delay | 30000
>> max_wal_senders | 0
>> max_worker_processes | 8
>> password_encryption | on
>> port | 5432
>> post_auth_delay | 0
>> pre_auth_delay | 0
>> quote_all_identifiers | off
>> random_page_cost | 4
>> restart_after_crash | on
>> search_path | "$user",public,hstore,dblink,w
>> ww_fdw
>> segment_size | 131072
>> seq_page_cost | 1
>> server_encoding | UTF8
>> server_version | 9.4.8
>> server_version_num | 90408
>> session_preload_libraries |
>> session_replication_role | origin
>> shared_buffers | 262144
>> shared_preload_libraries |
>> sql_inheritance | on
>> ssl | off
>> ssl_ca_file |
>> ssl_cert_file | server.crt
>> ssl_ciphers | HIGH:MEDIUM:+3DES:!aNULL
>> ssl_crl_file |
>> ssl_ecdh_curve | prime256v1
>> ssl_key_file | server.key
>> ssl_prefer_server_ciphers | on
>> ssl_renegotiation_limit | 0
>> standard_conforming_strings | on
>> statement_timeout | 0
>> stats_temp_directory | pg_stat_tmp
>> superuser_reserved_connections | 3
>> synchronize_seqscans | on
>> synchronous_commit | on
>> synchronous_standby_names |
>> syslog_facility | local0
>> syslog_ident | postgres
>> tcp_keepalives_count | 0
>> tcp_keepalives_idle | 0
>> tcp_keepalives_interval | 0
>> temp_buffers | 1024
>> temp_file_limit | -1
>> temp_tablespaces |
>> TimeZone | America/St_Lucia
>> timezone_abbreviations | Default
>> trace_notify | off
>> trace_recovery_messages | log
>> trace_sort | off
>> track_activities | on
>> track_activity_query_size | 1024
>> track_counts | on
>> track_functions | none
>> track_io_timing | off
>> transaction_deferrable | off
>> transaction_isolation | read committed
>> transaction_read_only | off
>> transform_null_equals | off
>> unix_socket_directories | /var/run/postgresql, /tmp
>> unix_socket_group |
>> unix_socket_permissions | 0777
>> update_process_title | on
>> vacuum_cost_delay | 0
>> vacuum_cost_limit | 200
>> vacuum_cost_page_dirty | 20
>> vacuum_cost_page_hit | 1
>> vacuum_cost_page_miss | 10
>> vacuum_defer_cleanup_age | 0
>> vacuum_freeze_min_age | 50000000
>> vacuum_freeze_table_age | 150000000
>> vacuum_multixact_freeze_min_age | 5000000
>> vacuum_multixact_freeze_table_age | 150000000
>> wal_block_size | 8192
>> wal_buffers | 128
>> wal_keep_segments | 0
>> wal_level | hot_standby
>> wal_log_hints | off
>> wal_receiver_status_interval | 10
>> wal_receiver_timeout | 60000
>> wal_segment_size | 2048
>> wal_sender_timeout | 60000
>> wal_sync_method | fdatasync
>> wal_writer_delay | 200
>> work_mem | 2048
>> xmlbinary | base64
>> xmloption | content
>> zero_damaged_pages | off
>> (239 rows)
>>
>>
>> Any suggestions, thoughts?
>>
>>
>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-08-08 16:37:29 Re: Corrupted Data ?
Previous Message Adrian Klaver 2016-08-08 16:19:39 Re: Corrupted Data ?