Re: Performance implications of adding a "disabled" column to a table

From: Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Performance implications of adding a "disabled" column to a table
Date: 2012-08-30 19:44:07
Message-ID: CA+4ThdqPO_6U+1bKWaxLYTcQRpb0ckTLoyVa32c3c-5cin5RKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks David,
I've been thinking hard on this one, and I tried to keep the details of the
application requirements to a minimum in my question, to focus on the
postgres performance aspect.
Though it may be off topic, let me try to describe the reason I'm trying
not to delete rows. Originally, my idea was exactly what you've described:
to move data to an archive table, but here is the problem:

The rows contain metadata for subgraphs (partitions) of a huge graph
structure. Each partition has its local coordinate system for the nodes,
and a parallel processing framework goes through partitions looking for
patterns. When patterns are found, the matching coordinates need to be
converted to global coordinates, and partition metadata allows that.
However, partition metadata is build via calculations based on the previous
partitions' metadata. If I delete a single row, or update it, it means that
the graph structure has changed, and I have to update metadata for all
partitions that follow the deleted/updated one, which means updating lots
of rows in db.
If I leave the dead/disabled partition in place, and simply add a copy of
it as if it is a new partition, then all is fine. All I have to do is not
to use the dead/disabled partition during graph pattern search.

I am not happy about having enabled/disabled status leaking into other
layers, and a view may be a good solution; thanks for that!. I am not happy
about hiding dirt under the carpet rather than putting it in the rubbish
bin, but the performance penalty is simply too big.

Best regards
Seref

On Thu, Aug 30, 2012 at 8:09 PM, David Johnston <polobo(at)yahoo(dot)com> wrote:

> From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Seref Arikan
> Sent: Thursday, August 30, 2012 12:48 PM
> To: PG-General Mailing List
> Subject: [GENERAL] Performance implications of adding a "disabled" column
> to
> a table
>
> Greetings,
> I have a large number of rows (up to 3-4 millions) that I'll either be
> fetching into ram (if it is a few thousand rows), or scrolling through a
> cursor.
> Deletions or updates on content of these rows lead to expensive operations
> in my business logic, so I simply need to mark them as disabled.
> I have two options here: to exclude the disabled rows from my query results
> via ..where not disabled or to not to process disabled rows in business
> logic.
> There will be very few disabled rows compared to total rows I'll have to
> return. This table will have very few queries, so including disabled =
> false
> or enabled = true in every query would not be an issue
>
> I have a feeling that including this criteria in the queries may add
> unnecessary processing to my queries, since the table is expected to get up
> to 100 million or more rows. So should I deal with this in DB, or at the
> application layer?
>
> Kind regards
> Seref
>
>
> ============================================================================
> =
>
> So you are willing to live with the fact that whatever, apparently
> important, business logic processing occurs when a row is updated or
> deleted
> is in-fact not important since now you are considering a class of data that
> is not useful but still present and reflected elsewhere in the system.
>
> If the proportion of rows that are disabled is small, and you don't intend
> to query "where disabled = TRUE", then adding the column is sufficient. No
> index required since it will not be used. Whether you alter queries to add
> the "disabled = FALSE" or read and deal with it in the application doesn't
> really matter (unless individual records are large). Personally I would
> suggest creating VIEWs that do not output "disabled" and that always use
> "disabled = FALSE". Call it something like "tablename_active" and then use
> that view thus only sending enabled records to the application layer.
>
> I would also make sure I am very confident on the implications of doing
> this
> versus deleting the data outright (though possibly moving it to an
> "archive"
> table)
>
> David J.
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2012-08-30 20:31:08 Re: Baseline configurations
Previous Message Mike Orr 2012-08-30 19:18:11 Baseline configurations