From: | Jorge Daniel <elgaita(at)hotmail(dot)com> |
---|---|
To: | "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org> |
Subject: | pt_toast table seems to be |
Date: | 2018-01-03 15:25:22 |
Message-ID: | SC1P15201MB20155D45F4D8E2282BDF59C3BC1E0@SC1P15201MB2015.LAMP152.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi guys
This is my first post in the comunity so my apologies in advance about the formalities.
In a production db , and after several OOM killer events on the postgres running on a docker (now fixed) we remain with a kind of corrupted pg_toast 😐:
2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL JDBC Driver:user=veon:db=veon:ERROR: unexpected chunk number 0 (expected 1) for toast value 17143928 in pg_toast_77809
2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL JDBC Driver:user=veon:db=veon:STATEMENT: SELECT "id", "needs_pumping", "needs_lucene", "surveybatchid", "ip_address", "is_open_proxy", "is_partial", "cookies", "social_media_source", "social_media_review_id", "social_media_pull_id", "social_media_pull_id_long", "source_config_unitid", "source_config_source_id", "updated", "external_id_printable_coder", "unique_reusable_id", "invite_errmsg", "reminder_errmsg", "last_seen_pagename", "last_submitted_pagename", "has_content_fields", "logid", "alertid", "language_name", "e_status", "e_reminder_status", "e_lastupdated", "e_creationdate", "e_sampleddate", "e_responsedate", "e_invitationdate", "reminderdate", "e_delivereddate", "e_remindeddate", "e_accepteddate", "e_initialfinishdate", "e_ta_completed_date", "e_expirationdate", "e_survey_method", "e_survey_source", "e_survey_type", "parse_status", "tagging_attempts", "e_optout", "e_bounce_category", "feed_fileid", "feed_file_recordid", "startdate", "e_lastname", "e_firstname", "e_address", "e_address2", "e_city", "e_state", "e_postalcode", "e_phone", "e_email", "is_mobile", "is_mobile_first", "is_mobile_finished", "is_cookie_confirmation_needed", "exclude_from_problem_tracker", "is_invitation_error", "page_number_seen_last", "page_number_seen_highest", "pages_submitted", "pages_validation_failed", "last_submit_date", "user_agent", "jsonb_fields", "thrift", "episodeid", "e_unitid", "e_committed_survey_specid", "customer_veonid", "xmin" FROM "survey_prior" WHERE survey_veon.surveyid = ANY($1)
The usual fix is :
REINDEX TABLE pg_toast_77809;
After this we can select * from survery_prior with no problem, and we can dump the entire table if we wish it.
Then the jobs resumes and we're happy until we hit again with the same error , this occurs randomly , we suspect on the UPDATE on certain row is creating the corruption.
We can't reproduce the ERROR because we can't do a full scan of the table while the error is still alive () on-call guys applies the fix inmediatly to resume production ).
So how can we discover the bad-rows and expose them, or how do we check the consistency of the pt_toast table?
We need this evidence to ask for window and a complete dump/restore that will wipe out this problem.
Thx in advance
Jorge Daniel Fernandez
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-01-03 15:36:21 | Re: pt_toast table seems to be |
Previous Message | Alvaro Herrera | 2018-01-02 15:38:54 | Re: Issue on restore / recover |