Re: Corrupted Data ?

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Ioana Danes <ioanadanes(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupted Data ?
Date: 2016-08-08 16:19:39
Message-ID: 0bb6c85f-aef3-dc50-81a0-e6f29e2cd022@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 08/08/2016 09:11 AM, Ioana Danes wrote:
> Hi,
>
> I suspect I am having a case of data corruption. Here are the details:
>
> I am running postgres 9.4.8:
>
> postgresql94-9.4.8-1PGDG.rhel7.x86_64
> postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
> postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
> postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
>
> on CentOS Linux release 7.2.1511 (Core)
>
> This is happening in a production environment but luckily on the
> reporting database.
> I have a cluster of 3 databases, db1 and db2 are masters and replicate
> between each other and also replicate to db3 (db1 <-> db2, db1 -> db3,
> db2 -> db3).
> For replication I am using Bucardo.

I would say this is more a question for the Burcardo list:

https://mail.endcrypt.com/mailman/listinfo/bucardo-general

I am just not seeing that replicating two masters on to a single
database is going to end well.

>
> The problem I am having is that one record in a table it shows a wrong
> value for one single field:
>
> select gameplayid, transactionid, encodedplay from mytable where
> transactionid in (75315811, 75315815) order by transactionid;
>
> gameplayid | transactionid | encodedplay
> ------------+---------------+--------------
> 160019239 | 75315811 | mix:5,2,7
> 160019237 | 75315811 | mix:5,4,8
> 160019235 | 75315811 | mix:6,2,9
> 160019233 | 75315811 | mix:1,9,8
> 160019271 | 75315815 | mix:9,0,9
> 160019269 | 75315815 | mix:9,8,9
> 160019267 | 75315815 | mix:9,2,2
> 160019265 | 75315815 | mix:2,2,8
> 160019263 | *75315811* | backup:1,9,1
> 160019261 | 75315815 | backup:2,0,9
>
> select gameplayid, transactionid, encodedplay from mytable where
> transactionid in (75315815) order by transactionid;
>
> gameplayid | transactionid | encodedplay
> ------------+---------------+--------------
> 160019271 | 75315815 | mix:9,0,9
> 160019269 | 75315815 | mix:9,8,9
> 160019267 | 75315815 | mix:9,2,2
> 160019265 | 75315815 | mix:2,2,8
> 160019263 | *75315811* | backup:1,9,1
> 160019261 | 75315815 | backup:2,0,9
>
> select gameplayid, transactionid, encodedplay from mytable where
> transactionid in (75315811) order by transactionid;
>
> gameplayid | transactionid | encodedplay
> ------------+---------------+--------------
> 160019239 | 75315811 | mix:5,2,7
> 160019237 | 75315811 | mix:5,4,8
> 160019235 | 75315811 | mix:6,2,9
> 160019233 | 75315811 | mix:1,9,8
>
> So the record with gameplayid = 160019263 have a wrong transactionid,
> 75315811 instead of 75315815.
> The correct value is 75315815 and that I know because of the following
> facts:
> - on db1 and db2 transactionid = 75315815 for gameplayid = 160019263,
> - this table gets mostly inserts, very rare updates and only on other 2
> fields not this one.
> - there is another parent table that shows the number of records in this
> table which is 4 for transactionid =75315811 and 6 for transactionid =
> 7531581.
>
> This table has an index by transactionid and that index seem correct
> because the filtering and the ordering are fine (like the filed has the
> correct value)...
>
> What puzzles me is that the value that shows in this field is a real
> value from another record...
>
> I only caught this issue because I have a script that runs in the night
> that compares the databases ...
>
> By now I updated the field with the correct value and everything seem
> stable.
>
> Postgres logs don't have any information about file corruption or any
> other kind of error. I also checked other logs on the system and I could
> not find any traces of corruption.
>
> select name, setting from pg_settings order by 1;
> name
> | setting
> -------------------------------------+-----------------------------------------------------------------------------------------------------------------
> allow_system_table_mods | off
> application_name | psql
> archive_command | test -f
> /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=60 -atz
> %p stldrdb:/data01/wal_files/%f
> archive_mode | on
> archive_timeout | 60
> array_nulls | on
> authentication_timeout | 60
> autovacuum | on
> autovacuum_analyze_scale_factor | 0.1
> autovacuum_analyze_threshold | 50
> autovacuum_freeze_max_age | 200000000
> autovacuum_max_workers | 3
> autovacuum_multixact_freeze_max_age | 400000000
> autovacuum_naptime | 60
> autovacuum_vacuum_cost_delay | 20
> autovacuum_vacuum_cost_limit | -1
> autovacuum_vacuum_scale_factor | 0.2
> autovacuum_vacuum_threshold | 50
> autovacuum_work_mem | -1
> backslash_quote | safe_encoding
> bgwriter_delay | 200
> bgwriter_lru_maxpages | 100
> bgwriter_lru_multiplier | 2
> block_size | 8192
> bonjour | off
> bonjour_name |
> bytea_output | hex
> check_function_bodies | on
> checkpoint_completion_target | 0.5
> checkpoint_segments | 16
> checkpoint_timeout | 300
> checkpoint_warning | 30
> client_encoding | UTF8
> client_min_messages | error
> commit_delay | 0
> commit_siblings | 5
> config_file | /data01/postgres/postgresql.conf
> constraint_exclusion | partition
> cpu_index_tuple_cost | 0.005
> cpu_operator_cost | 0.0025
> cpu_tuple_cost | 0.01
> cursor_tuple_fraction | 0.1
> data_checksums | off
> data_directory | /data01/postgres
> DateStyle | ISO, MDY
> db_user_namespace | off
> deadlock_timeout | 1000
> debug_assertions | off
> debug_pretty_print | on
> debug_print_parse | off
> debug_print_plan | off
> debug_print_rewritten | off
> default_statistics_target | 100
> default_tablespace |
> default_text_search_config | pg_catalog.english
> default_transaction_deferrable | off
> default_transaction_isolation | read committed
> default_transaction_read_only | off
> default_with_oids | off
> dynamic_library_path | $libdir
> dynamic_shared_memory_type | posix
> effective_cache_size | 1048576
> effective_io_concurrency | 1
> enable_bitmapscan | on
> enable_hashagg | on
> enable_hashjoin | on
> enable_indexonlyscan | on
> enable_indexscan | on
> enable_material | on
> enable_mergejoin | on
> enable_nestloop | on
> enable_seqscan | off
> enable_sort | on
> enable_tidscan | on
> escape_string_warning | on
> event_source | PostgreSQL
> exit_on_error | off
> external_pid_file |
> extra_float_digits | 0
> from_collapse_limit | 8
> fsync | on
> full_page_writes | on
> geqo | on
> geqo_effort | 5
> geqo_generations | 0
> geqo_pool_size | 0
> geqo_seed | 0
> geqo_selection_bias | 2
> geqo_threshold | 12
> gin_fuzzy_search_limit | 0
> hba_file | /data01/postgres/pg_hba.conf
> hot_standby | off
> hot_standby_feedback | off
> huge_pages | try
> ident_file | /data01/postgres/pg_ident.conf
> ignore_checksum_failure | off
> ignore_system_indexes | off
> integer_datetimes | on
> IntervalStyle | postgres
> join_collapse_limit | 8
> krb_caseins_users | off
> krb_server_keyfile | FILE:/etc/sysconfig/pgsql/krb5.keytab
> lc_collate | en_US.UTF-8
> lc_ctype | en_US.UTF-8
> lc_messages | en_US.UTF-8
> lc_monetary | en_US.UTF-8
> lc_numeric | en_US.UTF-8
> lc_time | en_US.UTF-8
> listen_addresses | *
> local_preload_libraries |
> lock_timeout | 0
> lo_compat_privileges | off
> log_autovacuum_min_duration | -1
> log_checkpoints | off
> log_connections | on
> log_destination | csvlog
> log_directory | pg_log
> log_disconnections | on
> log_duration | off
> log_error_verbosity | default
> log_executor_stats | off
> log_file_mode | 0600
> log_filename | postgresql-%d.log
> logging_collector | on
> log_hostname | off
> log_line_prefix | %t %d %u
> log_lock_waits | off
> log_min_duration_statement | 2000
> log_min_error_statement | error
> log_min_messages | error
> log_parser_stats | off
> log_planner_stats | off
> log_rotation_age | 1440
> log_rotation_size | 0
> log_statement | none
> log_statement_stats | off
> log_temp_files | -1
> log_timezone | America/St_Lucia
> log_truncate_on_rotation | on
> maintenance_work_mem | 131072
> max_connections | 300
> max_files_per_process | 1000
> max_function_args | 100
> max_identifier_length | 63
> max_index_keys | 32
> max_locks_per_transaction | 64
> max_pred_locks_per_transaction | 64
> max_prepared_transactions | 0
> max_replication_slots | 0
> max_stack_depth | 2048
> max_standby_archive_delay | 30000
> max_standby_streaming_delay | 30000
> max_wal_senders | 0
> max_worker_processes | 8
> password_encryption | on
> port | 5432
> post_auth_delay | 0
> pre_auth_delay | 0
> quote_all_identifiers | off
> random_page_cost | 4
> restart_after_crash | on
> search_path | "$user",public,hstore,dblink,www_fdw
> segment_size | 131072
> seq_page_cost | 1
> server_encoding | UTF8
> server_version | 9.4.8
> server_version_num | 90408
> session_preload_libraries |
> session_replication_role | origin
> shared_buffers | 262144
> shared_preload_libraries |
> sql_inheritance | on
> ssl | off
> ssl_ca_file |
> ssl_cert_file | server.crt
> ssl_ciphers | HIGH:MEDIUM:+3DES:!aNULL
> ssl_crl_file |
> ssl_ecdh_curve | prime256v1
> ssl_key_file | server.key
> ssl_prefer_server_ciphers | on
> ssl_renegotiation_limit | 0
> standard_conforming_strings | on
> statement_timeout | 0
> stats_temp_directory | pg_stat_tmp
> superuser_reserved_connections | 3
> synchronize_seqscans | on
> synchronous_commit | on
> synchronous_standby_names |
> syslog_facility | local0
> syslog_ident | postgres
> tcp_keepalives_count | 0
> tcp_keepalives_idle | 0
> tcp_keepalives_interval | 0
> temp_buffers | 1024
> temp_file_limit | -1
> temp_tablespaces |
> TimeZone | America/St_Lucia
> timezone_abbreviations | Default
> trace_notify | off
> trace_recovery_messages | log
> trace_sort | off
> track_activities | on
> track_activity_query_size | 1024
> track_counts | on
> track_functions | none
> track_io_timing | off
> transaction_deferrable | off
> transaction_isolation | read committed
> transaction_read_only | off
> transform_null_equals | off
> unix_socket_directories | /var/run/postgresql, /tmp
> unix_socket_group |
> unix_socket_permissions | 0777
> update_process_title | on
> vacuum_cost_delay | 0
> vacuum_cost_limit | 200
> vacuum_cost_page_dirty | 20
> vacuum_cost_page_hit | 1
> vacuum_cost_page_miss | 10
> vacuum_defer_cleanup_age | 0
> vacuum_freeze_min_age | 50000000
> vacuum_freeze_table_age | 150000000
> vacuum_multixact_freeze_min_age | 5000000
> vacuum_multixact_freeze_table_age | 150000000
> wal_block_size | 8192
> wal_buffers | 128
> wal_keep_segments | 0
> wal_level | hot_standby
> wal_log_hints | off
> wal_receiver_status_interval | 10
> wal_receiver_timeout | 60000
> wal_segment_size | 2048
> wal_sender_timeout | 60000
> wal_sync_method | fdatasync
> wal_writer_delay | 200
> work_mem | 2048
> xmlbinary | base64
> xmloption | content
> zero_damaged_pages | off
> (239 rows)
>
>
> Any suggestions, thoughts?
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ioana Danes 2016-08-08 16:28:02 Re: Corrupted Data ?
Previous Message Ioana Danes 2016-08-08 16:11:41 Corrupted Data ?