Re: database must be vacuumed with <N> transactions

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: Keith <keith(at)keithf4(dot)com>
Cc: John Scalia <jayknowsunix(at)gmail(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: database must be vacuumed with <N> transactions
Date: 2015-06-29 04:45:01
Message-ID: CAAcYxUcK+=v_t_R_9XwP1Ju+MJONAbSTReEwFx3mSXXn=h8psQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Thu, Jun 25, 2015 at 6:39 PM, Keith <keith(at)keithf4(dot)com> wrote:

> On Thu, Jun 25, 2015 at 8:57 PM, Dave Johansen <davejohansen(at)gmail(dot)com>
> wrote:
>
>> On Thu, Jun 25, 2015 at 2:23 PM, Keith <keith(at)keithf4(dot)com> wrote:
>>
>>> On Thu, Jun 25, 2015 at 2:42 PM, Dave Johansen <davejohansen(at)gmail(dot)com>
>>> wrote:
>>>
>>>> On Thu, Jun 25, 2015 at 11:38 AM, John Scalia <jayknowsunix(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>> The pg_stat_activity table will show you which processes are doing
>>>>> what, and you'll be able to see their process ID from the O/S. That might
>>>>> help you.
>>>>>
>>>>
>>>> Yes, but I need to see which process ID is doing a TON of small
>>>> transactions which is causing the XID to increment at an unexpectedly high
>>>> rate and so my question is "if there's a way for me to get the XID for
>>>> every connection/query?".
>>>>
>>>
>>> Look at the log_line_prefix options. To get everything you're looking
>>> for to try and identify where this is coming from look at using at a
>>> minimum: %p, %r, %m, %x, %u, %d
>>>
>>>
>>> http://www.postgresql.org/docs/8.4/static/runtime-config-logging.html#GUC-LOG-LINE-PREFIX
>>>
>>
>> We found the source of the issue. It's a function that we're using to all
>> a table to be populated from multiple processes. You can see the details
>> here:
>>
>> http://www.postgresql.org/message-id/CAAcYxUf9o3NpiKaHdeShO_iJjm92314eX6ko0JvJidmksutkJg@mail.gmail.com
>>
>> Any ideas on how we could handle this with causing it to increment the
>> XID?
>>
>
> Not sure on how to avoid the xid increment on this. I doubt this can be
> avoided, but hopefully someone else can respond to your other email with a
> more definitive answer.
>

I think it would be nice to make Postgres not increment the XID in this
situation, if it hasn't been fixed already,

> What I would recommend instead in the mean time to help deal with the XID
> issue is some close monitoring of your current txid vs the
> autovacuum_max_freeze_age. And get some sort of externally scheduled job to
> more aggressively vacuum the database based on that. And if you have any
> static, unwritten tables, you definitely want to VACUUM FREEZE these to
> avoid the txid building on them. The check_postgres script (
> https://bucardo.org/wiki/Check_postgres) has this monitoring available as
> an option. Or if you can't use it, you can at least look at its source code
> and get the queries it uses and incorporate it into your monitoring
> solution.
>

About a week before the problematic process started, we did a change in our
schema that affected >75% of the rows in our database. So there was also a
LARGE chunk of data that hadn't been frozen but that needed to and that
just made the problem worse. From this we've learned, that doing
appropriate VACUUM after a large import/re-write of data can help things
out. We've also added monitoring as you've recommended to be sure and
prevent this sort of issue.

> And while this won't help you now, 9.5 has this "do nothing" upsert you're
> trying to do here built in
>
>
> http://www.depesz.com/2015/05/10/waiting-for-9-5-add-support-for-insert-on-conflict-do-nothingupdate/
>
> For now, though, the function is the only reliable way to do what you're
> doing.
>

Yes, sadly it looks like we're stuck with this solution for now.

> You'll also have to make sure log_min_duration_statement is set to low
>>> enough to log the queries you're looking for. Beware setting disabling it
>>> or setting it too low, though, if you have a very high number of queries
>>> being run. This can quickly overload your IO and disk space.
>>>
>>
>> This probably isn't feasible, because we run a lot of queries with most
>> of them being very short duration.
>>
>
> Just want to clarify, I meant to say you don't want to set it to 0 since
> that would log everything. Disabling it by setting it to -1 would cause it
> to not log statements based on their runtime at all, which is the default.
>

We'll probably turn on logging when the query takes more than some sizeable
fraction of a second just to monitor the health of the database and query
usage in general.

> If you get the database running again, you can also look at the
>> pg_stat_statements contrib module to log query counts and parameterized
>> versions of all queries being run.
>>
>> http://www.postgresql.org/docs/8.4/static/pgstatstatements.html
>>
>> We got it up and we will be turning this module on to help diagnose
>> issues like this in the future.
>>
>>
>>> And I can't let this email go without say you REALLY need to look at
>>> upgrading your database. 8.4 is no longer being supported and receives no
>>> security or bug fixes.
>>>
>>
>> We are looking into upgrading to RHEL 7 which comes with Postgres 9.2 but
>> that's a SLOW process.
>>
>
> Recommend going straight to 9.4, especially with a "SLOW" major upgrading
> process. ;) Or, if the major upgrade is still really far off in reality,
> might want to just wait for 9.5 and to get the built in UPSERT. It's
> currently planned for an October release, but there's no guarantee there.
>
> There are official repos for 9.4 on RHEL 7, and there will be ones for 9.5
> as well when it hits.
>
> http://www.postgresql.org/download/linux/redhat/
> http://yum.postgresql.org/repopackages.php
>

The problem is just life cycle. Postgres is a lot better than most software
out there and has a 5 year life cycle, but we honestly rely on RHEL's 10
year cycle a LOT. I've been pushing very hard to upgrade to RHEL 7 and
we're barely getting to the point of being able to begin testing now that
it's over 1 year after its release. So honestly, using the newer versions
of Postgres provided on postgresql.org just isn't feasible because by the
time we get everything approved and tested, there are only a few years of
use left in the support life cycle and that makes it a deal breaker for us.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Eoin O'Hehir 2015-06-29 07:43:08 Re: Does VACUUM FULL need ANALYZE ?
Previous Message Venkata Balaji N 2015-06-29 04:19:35 Re: Moving pg_xlog folder