From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Herouth Maoz <herouth(at)unicell(dot)co(dot)il> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Question about optimizing access to a table. |
Date: | 2013-12-11 16:54:05 |
Message-ID: | CAMkU=1z9xb1vN+T3H77G7etR=_BxZxS=KaWoadH7NAtq-e=6CA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Dec 11, 2013 at 1:49 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il> wrote:
>
> On 10/12/2013, at 20:55, Jeff Janes wrote:
>
>
> On Tue, Dec 10, 2013 at 8:23 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il>wrote:
>
>>
>> Hello.
>>
>> I have one particular table with very specialized use. I am sending
>> messages to some partner. The partner processes them asynchronously, and
>> then returns the status report to me. The table is used to store a
>> serialized version of the message object, together with a few identifiers,
>> expiry date, and a reference from the partner. When the status report comes
>> back from the partner, we:
>>
>>
>> 1. Select the record using the partner's reference number
>> 2. reconstruct and queue the object.
>> 3. Delete the record from database using the serial primary key ID.
>>
>>
> Where do you "queue" the object? Back into a different table within the
> same database? Why not use "DELETE ...RETURNING"?
>
>
> The object is used in a Java application. Basically, it goes into a queue
> in memory. Some of its data is updated in two different tables in the same
> database at a later time, but only after some logic has been performed
> which may or may not change them relative to what they were when they came
> from the partner.
>
What happens if your Java application crashes after the delete occurs but
before the two different tables get updated? If this risk is acceptable,
perhaps it would also be acceptable to expire your data early when the
partner glitches and your table gets too big.
>
>
>>
>> Every once in a while we run a garbage collection process which selects
>> expired messages, reconstructs and queues them for processing, and deletes
>> the record using the primary key.
>>
>> This works wonderfully as long as the table remains small - a few
>> thousand records waiting for their status report, and that's it. The table
>> is set to have frequent auto-anylize runs.
>>
>
> You might want to turn
>
>
> ...yes?
>
Sorry, before I got distracted by a shiny object, I meant to say you might
want to turn up vacuuming, not just analyze.
>
>
>
>>
>> The problem starts when our partner has some glitch, under high load, and
>> fails to send back a few hundred thousand reports. In that case, the table
>> grows to a few hundred records, and they are not deleted until they hit
>> their expiry date, at which point the "garbage collector" takes care of
>> them and everything goes back to normal. When it contains hundreds of
>> thousands of records, performance deteriorates considerably.
>>
>
> There is no inherent reason the performance needs to degrade. Can you
> give the specific queries that perform worse?
>
>
> They are the simplest queries possible. The performance deterioration is
> only fractions of a second, you see. But when we have thousands of status
> reports per minute, it builds up to a large delay after a few minutes.
>
> The queries are:
> select id, sm from transient where smsc_reference = ? and msisdn = ?
> and then:
> delete from transient where id = ?
>
I thought the problem was that when the partner glitches, the above queries
were not getting executed at all, because the stream driving them was no
longer feeding them data. But if they aren't executing, they can't be
showing performance deterioration. Or are those same queries also part of
the garbage collection? Or is only when the partner stream resumes that
the problem shows up?
Assuming the problem is just the logN of crawling through the bloated
indexes, that is where DELETE...RETURNING would really help be cutting the
impact of the bloat down by 2 fold. But if you can't change the code, more
aggressive vacuuming might be your best option. It won't help when the
table is bloated with live rows (while the partner is glitching) but it
should help it recover faster once the glitch is resolved.
If the problem is not the normal logN time of using the index, but
something more pathological, then we probably need to see an explain
(analyze, buffers) of the queries during the bad period.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Mason Sharp | 2013-12-11 16:55:30 | Re: [Postgres-xc-general] "Tuple not found error" during Index creation |
Previous Message | Kevin Grittner | 2013-12-11 16:28:18 | Re: Question about optimizing access to a table. |