Re: pt_toast table seems to be

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: pt_toast table seems to be
Date: 2018-01-04 01:37:55
Message-ID: 635fe574-b95a-dc26-24d9-6518c86f0363@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Also the Ceph version might be important (if you are running pre Jewel
then there was a data corruption bug with db type workloads for 0.94.9
and earlier).

regards

Mark

On 04/01/18 10:42, Rui DeSousa wrote:
>
> Interesting setup.   What is your ceph settings? Are you using caching
> and is it set to write-back or write-through?
>
> If you’re using caching it should be using write-through with the max
> dirty set to zero.
>
> |rbd cache max dirty| to 0
>
> http://docs.ceph.com/docs/master/rbd/rbd-config-ref/
>
>
>
>> On Jan 3, 2018, at 3:38 PM, Jorge Daniel <elgaita(at)hotmail(dot)com
>> <mailto:elgaita(at)hotmail(dot)com>> wrote:
>>
>> We're using ceph, ext4  volumes where resides our PGDATA , it's
>> mounted like this:
>> /dev/rbd5 on /pg_data type ext4
>> (rw,relatime,discard,stripe=1024,data=ordered)
>>
>> Jorge Daniel Fernandez
>>
>>
>> ------------------------------------------------------------------------
>> *From:*Rui DeSousa <rui(dot)desousa(at)icloud(dot)com
>> <mailto:rui(dot)desousa(at)icloud(dot)com>>
>> *Sent:*Wednesday, January 3, 2018 5:20 PM
>> *To:*Jorge Daniel
>> *Subject:*Re: pt_toast table seems to be
>> I’m not a docker expert nor have I ever run Postgres in a docker
>> container; but my thought is docker is not handle sync() truly to
>> form — i.e. buffering for performance improvements instead.  If it
>> buffers it but then crashes before the data is actual saved it will
>> lead to corruption.  I’ve seen similar issues with RAID controllers
>> that buffer the sync() call but eventual fail to actual save the data
>> for whatever reason; buffer overrun, system crashed, etc.
>>
>> How is the Postgres data stored In the docker container? What file
>> system is it using? Is using a docker data volume, directory mount,
>> or storage plug in?
>>
>> Doing a quick google search it does seem that docker has it’s own
>> storage model which introduces it’s own COW semantics, plugins, etc;
>> although, I can’t find anything on if it guarantees sync() calls.
>>
>>> On Jan 3, 2018, at 2:43 PM, Jorge Daniel <elgaita(at)hotmail(dot)com
>>> <mailto:elgaita(at)hotmail(dot)com>> wrote:
>>>
>>> Hi Rui, every time autoAnalyze ran against that table , after a long
>>> time running, the OOM killer act up and killed the autovacuum , in
>>> some cases the engine got in recovery and other cases it was causing
>>> the crash of the whole container.  We're pretty sure that this last
>>> ones were the ones that lead to this issue.
>>>
>>> We were able to reproduce the OOM killing and our findings lead us
>>> to a strange Docker setting for the SHMMAX:
>>>
>>> Linux 434d18e30d83 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec
>>> 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>> postgres(at)434d18e30d83:~$ cat /proc/sys/kernel/shmmax
>>> 18446744073692774399
>>>
>>> after we a fixed that   number, we were able to run the ANALYZE and
>>> Vacuum to its end without any OOM killing.
>>>
>>> This is related to
>>> https://github.com/moby/moby/issues/16315
>>>
>>> I don't have now that  syslog to examine if the sync() has worked
>>> properly or not.
>>>
>>> <https://github.com/moby/moby/issues/16315>
>>>
>>> Container is reporting incorrect /proc/sys/kernel/shmmax value ·
>>> Issue #16315 · moby/moby <https://github.com/moby/moby/issues/16315>
>>> github.com <http://github.com/>
>>> BUG REPORT INFORMATION My host is Ubuntu 15.04 (kernel
>>> 3.19.0-28-generic) docker version: Client version: 1.5.0 Client API
>>> version: 1.17 Go version (client): go1.3.3 Git commit (client):
>>> a8a31ef OS...
>>>
>>>
>>>
>>>
>>> Jorge Daniel Fernandez
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:*Rui DeSousa <rui(dot)desousa(at)icloud(dot)com
>>> <mailto:rui(dot)desousa(at)icloud(dot)com>>
>>> *Sent:*Wednesday, January 3, 2018 4:23 PM
>>> *To:*Jorge Daniel
>>> *Cc:*pgsql-admin(at)lists(dot)postgresql(dot)org
>>> <mailto:pgsql-admin(at)lists(dot)postgresql(dot)org>
>>> *Subject:*Re: pt_toast table seems to be
>>> Do you mean the Postgres instance is killed or the docker instance
>>> is killed?  For OOM, what’s the memory configuration and how much
>>> swap is allocated?
>>>
>>> I would look at your disk subsystem from the prospected of Postgres;
>>> when it issues a sync() does your setup honor it?  If you docker
>>> instance is crashing then it seems like sync() might not honored
>>> which would lead to corruption issues you’re describing.
>>>
>>>> On Jan 3, 2018, at 10:25 AM, Jorge Daniel <elgaita(at)hotmail(dot)com
>>>> <mailto:elgaita(at)hotmail(dot)com>> wrote:
>>>>
>>>> Hi guys
>>>> This is my first post in the comunity so my apologies in advance
>>>> about the formalities.
>>>>
>>>>
>>>> In a production db , and after several OOM killer events on the
>>>> postgres running on a docker (now fixed) we remain with a kind of
>>>> corrupted pg_toast 😐:
>>>>
>>>>
>>>> 2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL
>>>> JDBC Driver:user=veon:db=veon:ERROR: unexpected chunk number 0
>>>> (expected 1) for toast value 17143928 in pg_toast_77809
>>>> 2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL
>>>> JDBC Driver:user=veon:db=veon:STATEMENT: SELECT "id",
>>>> "needs_pumping", "needs_lucene", "surveybatchid", "ip_address",
>>>> "is_open_proxy", "is_partial", "cookies", "social_media_source",
>>>> "social_media_review_id", "social_media_pull_id",
>>>> "social_media_pull_id_long", "source_config_unitid",
>>>> "source_config_source_id", "updated",
>>>> "external_id_printable_coder", "unique_reusable_id",
>>>> "invite_errmsg", "reminder_errmsg", "last_seen_pagename",
>>>> "last_submitted_pagename", "has_content_fields", "logid",
>>>> "alertid", "language_name", "e_status", "e_reminder_status",
>>>> "e_lastupdated", "e_creationdate", "e_sampleddate",
>>>> "e_responsedate", "e_invitationdate", "reminderdate",
>>>> "e_delivereddate", "e_remindeddate", "e_accepteddate",
>>>> "e_initialfinishdate", "e_ta_completed_date", "e_expirationdate",
>>>> "e_survey_method", "e_survey_source", "e_survey_type",
>>>> "parse_status", "tagging_attempts", "e_optout",
>>>> "e_bounce_category", "feed_fileid", "feed_file_recordid",
>>>> "startdate", "e_lastname", "e_firstname", "e_address",
>>>> "e_address2", "e_city", "e_state", "e_postalcode", "e_phone",
>>>> "e_email", "is_mobile", "is_mobile_first", "is_mobile_finished",
>>>> "is_cookie_confirmation_needed", "exclude_from_problem_tracker",
>>>> "is_invitation_error", "page_number_seen_last",
>>>> "page_number_seen_highest", "pages_submitted",
>>>> "pages_validation_failed", "last_submit_date", "user_agent",
>>>> "jsonb_fields", "thrift", "episodeid", "e_unitid",
>>>> "e_committed_survey_specid", "customer_veonid", "xmin" FROM
>>>> "survey_prior" WHERE survey_veon.surveyid = ANY($1)
>>>>
>>>> The usual fix is :
>>>>
>>>> REINDEX TABLE pg_toast_77809;
>>>>
>>>> After this we can select * from survery_prior with no problem, and
>>>> we can dump the entire table if we wish it.
>>>>
>>>> Then the jobs resumes and we're happy until we hit again with the
>>>> same error , this occurs randomly , we suspect on the UPDATE  on
>>>> certain row is creating the corruption.
>>>>
>>>> We can't reproduce the ERROR because we can't do a full scan of the
>>>> table while the error is still alive () on-call guys applies the
>>>> fix inmediatly to resume production ).
>>>>
>>>> So how can we discover the bad-rows and expose them, or how do we
>>>> check the consistency of the pt_toast table?
>>>>
>>>> We need this evidence to ask for window and a complete dump/restore
>>>> that will wipe out this problem.
>>>>
>>>> Thx in advance
>>>>
>>>> Jorge Daniel Fernandez
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Dave Bolt 2018-01-04 13:25:09 RE: Failed to execute pg_dump
Previous Message Rui DeSousa 2018-01-03 21:42:00 Re: pt_toast table seems to be