Re: Why does the query planner use two full indexes, when a dedicated partial index exists?

From: Richard Neill <rn214(at)richardneill(dot)org>
To:
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Date: 2012-12-22 17:46:30
Message-ID: 50D5F1F6.7090704@richardneill.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 21/12/12 05:15, Jeff Janes wrote:
>
>
> - What I'm trying to do is trace the history of the books
> through the system and assign each one a proper unique id.
> So, if I see a book with "parcel_id_code = 37",
> is it a new book (after pid wrap), or is it the same book I saw 1
> minute ago, that hasn't exited the sorter?
>
> I'm not sure how you are implementing this goal, but I don't think it is
> best done by looping over all books (presumably from some other table?)
> and issuing an individual query for each one, if that is what you are
> doing. Some kind of bulk join would probably be more efficient.

It would be nice to do a bulk join, but it's not possible: the query is
time sensitive. Consider:

id/pkey pid timestamp exit_state destination

1 77 -24 hours 1 212
2 77 -18 hours 1 213
3 77 -12 hours 1 45
4 77 -6 hours 1 443
5 77 0 hours null

[in future...]
5 77 0 hours 1 92
6 77 4 hours null

At time +5 minutes, I receive a report that a book with parcel_id 77 has
successfully been delivered to destination 92. So, what I have to do is:

* First, find the id of the most recent book which had pid=77 and where
the exit state is null. (hopefully, but not always, this yields exactly
one row, which in this case is id=5)

* Then update the table to set the destination to 92, where the id=5.

It's a rather cursed query, because:
- the sorter machine doesn't give me full info in each message, only
deltas, and I have to reconstruct the global state.
- pids are reused within hours, but don't increase monotonically,
(more like drawing repeatedly from a shuffled deck, where cards
are only returned to the deck sporadically.
- some pids get double-reported
- 1% of books fall off the machine, or get stuck on it.
- occasionally, messages are lost.
- the sorter state isn't self-consistent (it can be restarted)

The tracker table is my attempt to consistently combine all the state we
know, and merge in the deltas as we receive messages from the sorter
machine. It ends up reflecting reality about 99% of the time.

Richard

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2012-12-23 22:55:15 Re: Performance on Bulk Insert to Partitioned Table
Previous Message Richard Neill 2012-12-22 17:29:03 Re: Why does the query planner use two full indexes, when a dedicated partial index exists?