Quick Links

Re: Fast insertion indexes: why no developments

From:	Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Fast insertion indexes: why no developments
Date:	2013-11-13 09:07:56
Message-ID:	1384333676597-5778092.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon Riggs wrote
> On 5 November 2013 14:28, Leonardo Francalanci <

> m_lists@

> > wrote:
>
>> Either my sql is not correct (likely), or my understanding of the minmax
>> index is
>> not correct (even more likely), or the minmax index is not usable in a
>> random inputs
>> scenario.
>
> Please show the real world SQL you intend to run, so we can comment on
> it. Inventing a use case that breaks effectiveness of any optimisation
> is always easy, but the question is whether the use case is likely or
> even desirable.

The use case is pretty simple.
Think it as the NSA, as it would be much easier.
Collect all calls made/received by any user on a mobile network.
(in fact, it's something more than calls, so in fact is 2-10 rows per call).
Keep the data for 20 days.
That's the insert part.

Query:
search calls made/received by the user using IMSI (caller id) or IMEI
(phone id). Date range is usually days (past 4 days, from 10 days ago
to 5 days ago...)

The result is just a very small percentage of the rows present in the
table: a single user doesn't call that much!
Searches are made by a human, so no that many request per second.

It's not a "write mostly" scenario, it's a 99% write 1% read scenario.

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.
Solution so far? partition every 15 minutes, create the indexes in bulk.

Simon Riggs wrote
> If we have a query to show the most recent calls by a particular caller
>
> SELECT *
> FROM cdr
> WHERE callerid = X
> ORDER BY call_timestamp DESC
> LIMIT 100
>
> then this could potentially be optimised using a minmax index, by
> traversing the data ranges in call_timestamp order. That is not part
> of the code in this initial release, since the main use case is for
> WHERE call_timestamp >= X, or WHERE primarykey = Y

I don't understand how a index on call_timestamp would help
in the query above.

Simon Riggs wrote
> I don't believe there is a credible business case for running that
> same query but without the ORDER BY and LIMIT, since it could
> potentially return gazillions of rows

Gazillion of rows??? We're talking about calls made/received by
one user here. How many calls do you make in 10 days???

Simon Riggs wrote
> so it isn't surprising at all
> that it would access a large % of the table.

In fact, the query I use return a fraction of the table, and only
a very small amount of users get searched.

Simon, you keep on talking about these minmax indexes, and
I still don't see any reference to some performance tests.

And, again, I think that random values insertion is the worst
use case for minmax indexes.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

In response to

Re: Fast insertion indexes: why no developments at 2013-11-12 14:19:19 from Simon Riggs

Responses

Re: Fast insertion indexes: why no developments at 2013-11-13 11:08:01 from Simon Riggs
Re: Fast insertion indexes: why no developments at 2013-11-13 11:19:52 from Jeremy Harris

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Colin 't Hart	2013-11-13 09:28:07	Re: TABLE not synonymous with SELECT * FROM?
Previous Message	Nicolas Barbier	2013-11-13 08:57:44	Re: Fast insertion indexes: why no developments