Re: Question about optimizing access to a table.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question about optimizing access to a table.
Date: 2013-12-10 18:55:45
Message-ID: CAMkU=1xZAtKddZngO4=YW7okHD06weYKBg53hEuEcGzXteKwyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Dec 10, 2013 at 8:23 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il> wrote:

>
> Hello.
>
> I have one particular table with very specialized use. I am sending
> messages to some partner. The partner processes them asynchronously, and
> then returns the status report to me. The table is used to store a
> serialized version of the message object, together with a few identifiers,
> expiry date, and a reference from the partner. When the status report comes
> back from the partner, we:
>
>
> 1. Select the record using the partner's reference number
> 2. reconstruct and queue the object.
> 3. Delete the record from database using the serial primary key ID.
>
>
Where do you "queue" the object? Back into a different table within the
same database? Why not use "DELETE ...RETURNING"?

>
> Every once in a while we run a garbage collection process which selects
> expired messages, reconstructs and queues them for processing, and deletes
> the record using the primary key.
>
> This works wonderfully as long as the table remains small - a few thousand
> records waiting for their status report, and that's it. The table is set to
> have frequent auto-anylize runs.
>

You might want to turn

>
> The problem starts when our partner has some glitch, under high load, and
> fails to send back a few hundred thousand reports. In that case, the table
> grows to a few hundred records, and they are not deleted until they hit
> their expiry date, at which point the "garbage collector" takes care of
> them and everything goes back to normal. When it contains hundreds of
> thousands of records, performance deteriorates considerably.
>

There is no inherent reason the performance needs to degrade. Can you give
the specific queries that perform worse?

> I am trying to figure out a solution that will keep the system working
> well even when there is a large number of records in the table. At first I
> thought of partitioning the table on the partner's reference field. But the
> DELETE would actually slow down if I do this, right?
>

Not necessarily, but partitioning should be your last resort not your first
resort, and I don't see any reason it would be needed here.

Indexes:
> "transient_pkey" PRIMARY KEY, btree (id)
> "transient_msisdn_ref" btree (msisdn, smsc_reference)
> "transient_sm_vp" btree (validity)
>

What is the partner reference? If it is smsc_reference, then you probably
need a index in which that is the lead (or only) column.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2013-12-10 19:13:26 Re: While only running pg_dump, postgresql performs writes inside base/ directory. How can we stop this?
Previous Message Kevin Grittner 2013-12-10 18:55:23 Re: Question about optimizing access to a table.