From: | "C(dot) Mundi" <cmundi(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | High-Concurrency GiST in postgreSQL |
Date: | 2011-12-05 18:31:09 |
Message-ID: | CAPvS8WZNQ8ysY=hyij5EJscrZZyG9V7uZqAxsQfu8tWDpQNvBg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello. This is my first post. As such, feedback on style and choice of
venue are especially welcome.
I am a regular but not especially expert user of a variety of databases,
including postgreSQL.
I have only modest experience with spatial databases.
I have a new project[1] in which GiST could be very useful, provided I can
achieve high concurrency. Starting with some empirical evidence that R*
would be a good place to start, and after reading "High-Concurrency Locking
in R-Trees" [2], I went looking for an implementation of R-link trees
extended to R*. So I was very interested to read Hellerstein et al. where
they wrote [3]:
*High concurrency, recoverability, and degree-3 consis-
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-
tation of GiSTs.
*
Since this information may be somewhat dated, and GiST has obviously come a
long way in postgreSQL, I am looking for current information and advice on
the state of concurrency in GiST in postgreSQL. If someone has already
done an R*-link tree then that could really help me. ( I can wish, no?)
Thanks for reading and thanks for advice or pointers.
Carlos
[1] It's not a GiS prject, but it has some similarities:
(a) I need to manage up to 10 million three-dimensional "boxes" or as few
as 1000 "boxes"
(b) The distribution of sizes, aspect ratios and locations in R3 are all
unknown a priori and may change during execution under insert/delete.
(c) Queries may arrive asynchronously and at high rate from hundreds (or
more?) of compute nodes.
(d) Successive queries from any node, viewed as a time-sequence, may have
very low (or at best sporadic) spatial correlation -- lots of page jumps.
(e) R* will be advantageous over R, but Priority R is probably not
especially useful since turnover may be greater than 20% during a "job."
(f) I would like to avoid teh complications of distributed databases, again
because of the high turnover.
[2] Marcel Kornacker and Douglas Banks. High-Concurrency Locking in
R-Trees. (1995)
[3] Hellerstein, Naughton, and Pfeffer. Generalized Search Trees for
Database Systems. (1995)
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Kretschmer | 2011-12-05 18:41:50 | disallow SET WORK_MEM |
Previous Message | Pavel Stehule | 2011-12-05 16:47:07 | Re: pl/pgsql and arrays[] |