From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | Daniele Posenato <daniele(dot)posenato(at)smartec(dot)ch> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #12785: server process (PID 2872) was terminated by exception 0xC0000005 |
Date: | 2015-02-23 17:59:15 |
Message-ID: | CA+bJJbyF9Pz6cTv5=qik5VL5yvCaEz+-6pEuAFrB-DQhkMu3yA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi Daniele:
On Mon, Feb 23, 2015 at 6:09 PM, Daniele Posenato <
daniele(dot)posenato(at)smartec(dot)ch> wrote:
> Thank you a lot for the answer, I really appreciate it. I will try to do
> what you have suggested and then I will let you know.
>
That's ok, but I doubt I can help you more ( I abandoned Windows more than
a dozen years ago, haven't looked back, although I still remember how that
code appeared when I did something wrong in my programs ).
>
> Just for information the problem has occurred again since the last email
> and always on the same query. I could understand a crash of the service on
> performing an update or a delete, but I have some difficulties to
> understand this on a select. If it was an hardware problem I would expect
> the service to crash also on other actions and not randomly (about once per
> week) only on a specific select (that is executed every 10 seconds).
>
Is that query consuming a lot of your resources? ( It may be due to it
being lengthy or just frequent ) because in that case it makes sense.
In many applications I have 99.9% of the work / ram usage are selects, so a
random crash is normally going to hit me in one of this.
On the crashing on select stuff. Suppose you have a faulty sector or ram
location. When you write to it ( upd or del ) nothing happens, it just
sotres the bad value, when you read it ( select, part 1, reading from
disk/ram ) nothing happens, you just get bad data, say a null pointer, then
when you use ( select part 2 ) you get the fault. In fact, if a ram
location loses data written you do not notice it on writting it, or on
reading it ( unless you get a parity error ) but on using what you read
from it.
This is a normal pattern on programming bugs too. You have an error in
some code and store something in a random ( or not so random ) ram
location . That code seems to work ok. But then an unrelated piece of code
reads the corrupted data and crashes ( it is one of the way the buffer
overflows work, the guilty code overflows a buffer, but works, and another
chunk of code gets its data overwritten and crashes ).
>
> Is there a way to write a select that is able to crash the service?
>
With a good database, on good hardware, with adequate ( inifinite, as you
can crash any service by just joining enough copies of a table to exhaust
avalible ) memory and disk there shouldn't be, but if you read corrupted
data or get hit by a bit flips in the middle of processing, it may Are you
able to do a full database dump ( pg_dump, not base backup ) of your
database? If you are then you are able to read all the tables, and I would
suggest trying to reindex every table if you have quiescent periods (
pg_dump does not touch indexes, so if you have good data bad corrupted
indexes that should fix it )
>
> I will let you know the results of the hardware check after the planned
> restart.
>
I do not know ( or remember ) what your DB sizes and uptime requirements
are. But I've had that kind of problems caused by corrupted disk
structures, and have being able to recover them rewritting the database,
that means dump, drop, restore, but this depends on the system, I cannot
recommend doing it, but as I said before, if I had the same aplication in 4
machines crashing randomly in only one of them I would try to triple test
the machine and dump / restore it.
Best ergards.
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2015-02-23 22:26:36 | View restore error in 9.3-9.4 upgrade |
Previous Message | Daniele Posenato | 2015-02-23 17:09:10 | Re: BUG #12785: server process (PID 2872) was terminated by exception 0xC0000005 |