Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
Date: 2018-02-22 15:00:26
Message-ID: 20180222100026.0625a58f656ea2413215ea46@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 22 Feb 2018 03:57:34 +0100
Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On 02/21/2018 05:00 PM, Bill Moran wrote:
> > On Wed, 21 Feb 2018 13:33:18 +0100
> > Alexander Farber <alexander(dot)farber(at)gmail(dot)com> wrote:
> >
> >> Hi Martin -
> >>
> >> On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin(dot)moore(at)avbrief(dot)com>
> >> wrote:
> >>
> >>> I’m no expert but I’d think it unlikely an index would be considered for a
> >>> table with only 100 rows in. Also I’m pretty sure only one index per table
> >>> is used, so you’d want to put state1 and state2 in one index.
> >>
> >> I hope to have more records in the words_games table later when my game is
> >> launched (currently in soft launch/beta).....
> >
> > To elaborate+clarify Martin's comments.
> >
> > The index will not be used while the table is small because it's actually slower
> > to process an index than it is to just read the entire table. However, as the
> > table gets more rows, these timings will reverse and Postgres will start using
> > the indexes. It's probably best to just create them even though the table is
> > small. The performance improvement you'll get when the table grows will be
> > well worth it, and it avoids the problem of trying to remember to create it later.
> >
> > However, Martin's other comment about only using a single index is incorrect.
> > Postgres can use multiple indexes per query, so it's often good practace to
> > put indexes on every column that might ever be used in a WHERE clause.
>
> I call this practice "shotgun" and generally discourage people from
> using it. It seems attractive, but not every where condition can be
> evaluated using an index, and every index has maintenance overhead.
>
> There are cases when it's somewhat reasonable (e.g. when you don't know
> which columns will be referenced by WHERE conditions, and data ingestion
> has lower priority than queries). But that doesn't seem to be the case
> here - you know the WHERE conditions, and people are likely sending a
> lot of inserts (and expecting low latency responses).

Can't _really_ disagree with anything you're saying there ... but I disagree
with the overall sentament for the following reasons:

1) Not everyone has easy access to experienced people like you and I. As a
result, I often recommend the "least likely to be wrong" course of action instead
of recommending investigation that requires expertise that the original poster
might not possess ... after all, if they had the experience to do all the
research, it's unlikely that they'd be asking this question in the first
place.
2) The negative impact of an unused index is tiny compared to the negative
impact of a missing index.
3) Dropping an unused index is _far_ less headache than creating a missing
index on a busy database.
4) Without knowing _all_ the details of the app and how it works, my past
experience is that problems are about a jillion times more likely to be
the result of underindexing (although I _have_ seen issues due to
overindexing, so it _does_ happen)

I can't argue that the _best_ course of action is to set up a simulation
that can exercise the system at predicted size and load levels and use that
to tune things. But communicating all that to others has never been easy
in my experience. As recently as this month my team was blown away that I
could create a simulation that demonstrated how my code would behave under
real-world like conditions. It's apparently not something that a lot of
people understand or are good at or something.

--
Bill Moran <wmoran(at)potentialtech(dot)com>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2018-02-22 15:57:02 Any reason not to show "null input" (strict) column in \df+ output?
Previous Message Ron Johnson 2018-02-22 14:21:04 Re: initdb when data/ folder has mount points