From: | Leonardo Francalanci <m_lists(at)yahoo(dot)it> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Fast insertion indexes: why no developments |
Date: | 2013-11-13 09:07:56 |
Message-ID: | 1384333676597-5778092.post@n5.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon Riggs wrote
> On 5 November 2013 14:28, Leonardo Francalanci <
> m_lists@
> > wrote:
>
>> Either my sql is not correct (likely), or my understanding of the minmax
>> index is
>> not correct (even more likely), or the minmax index is not usable in a
>> random inputs
>> scenario.
>
> Please show the real world SQL you intend to run, so we can comment on
> it. Inventing a use case that breaks effectiveness of any optimisation
> is always easy, but the question is whether the use case is likely or
> even desirable.
The use case is pretty simple.
Think it as the NSA, as it would be much easier.
Collect all calls made/received by any user on a mobile network.
(in fact, it's something more than calls, so in fact is 2-10 rows per call).
Keep the data for 20 days.
That's the insert part.
Query:
search calls made/received by the user using IMSI (caller id) or IMEI
(phone id). Date range is usually days (past 4 days, from 10 days ago
to 5 days ago...)
The result is just a very small percentage of the rows present in the
table: a single user doesn't call that much!
Searches are made by a human, so no that many request per second.
It's not a "write mostly" scenario, it's a 99% write 1% read scenario.
Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.
Solution so far? partition every 15 minutes, create the indexes in bulk.
Simon Riggs wrote
> If we have a query to show the most recent calls by a particular caller
>
> SELECT *
> FROM cdr
> WHERE callerid = X
> ORDER BY call_timestamp DESC
> LIMIT 100
>
> then this could potentially be optimised using a minmax index, by
> traversing the data ranges in call_timestamp order. That is not part
> of the code in this initial release, since the main use case is for
> WHERE call_timestamp >= X, or WHERE primarykey = Y
I don't understand how a index on call_timestamp would help
in the query above.
Simon Riggs wrote
> I don't believe there is a credible business case for running that
> same query but without the ORDER BY and LIMIT, since it could
> potentially return gazillions of rows
Gazillion of rows??? We're talking about calls made/received by
one user here. How many calls do you make in 10 days???
Simon Riggs wrote
> so it isn't surprising at all
> that it would access a large % of the table.
In fact, the query I use return a fraction of the table, and only
a very small amount of users get searched.
Simon, you keep on talking about these minmax indexes, and
I still don't see any reference to some performance tests.
And, again, I think that random values insertion is the worst
use case for minmax indexes.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
From | Date | Subject | |
---|---|---|---|
Next Message | Colin 't Hart | 2013-11-13 09:28:07 | Re: TABLE not synonymous with SELECT * FROM? |
Previous Message | Nicolas Barbier | 2013-11-13 08:57:44 | Re: Fast insertion indexes: why no developments |