From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Leonardo Francalanci <m_lists(at)yahoo(dot)it> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fast insertion indexes: why no developments |
Date: | 2013-10-30 11:08:02 |
Message-ID: | CA+U5nMJ=RrTXUY0g1VS7KcR9DVz9DezvN7Z2QE_R7TyTZCh36A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 30 October 2013 10:35, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>> Presumably the data you are inserting isn't actually random. Please
>> describe the use case you are considering in more detail and some view
>> on how frequent that is, with some examples. Once we understand the
>> use case and agree it is important, we might solve problems.
>
>
> Collecting calls data for mobile network operators (and no, I don't work for the NSA...)
> Easily 5000-10000 inserts per second. Indexes in timestamp and ID (not a problem, always increasing so no btree issues) and in called #, calling #, imsi, imei. The last four obviously are random, out of millions of possible values.
> After the few first millions of records, the disks can't keep up with the amount of random writing in the indexes. Workaround: the table is partitioned every 15 minutes, and indexes created in bulk after we "start" the new 15-minutes partition. Searches on current 15 minutes are not allowed (as it is not indexed), and searches on older data are K*log(N) (where K is the number of partitions).
> Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still think that btree just aren't fit for this kind of problem. I don't delete data, I don't update data, there's not that much concurrency going on. I would sacrifice search speed (K*log(N) is already much slower than "regular" btree usage) for realtime insertion.
>
> I don't think I'm the only one having a big system to be indexed by random values.
>
> In fact, I didn't want to turn this thread into a "help me with this workload" thread. I just wanted to know if there was some other known issues with these "different indexes" other than "not enough time to implement them correctly": I was afraid that someone already dismissed them as "good in theory, bad in practice"...
What is the reason for needing such fast access to individual groups
of records? Sure sounds like the NSA or similar ;-)
Sacrificing timeliness for efficiency is a common solution. I'm seeing
lots of areas where being able to specify the timeliness that is
acceptable in a query leads to various optimisations of this and
similar.
Indexes are a declarative solution. We would need to be able to
specify the tolerances to be able to do this. (You can write your own
index...)
In terms of generality, do you think its worth a man year of developer
effort to replicate what you have already achieved? Who would pay?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Leonardo Francalanci | 2013-10-30 11:23:46 | Re: Fast insertion indexes: why no developments |
Previous Message | Leonardo Francalanci | 2013-10-30 10:35:10 | Re: Fast insertion indexes: why no developments |