Hi
I've loved reading all of your thoughts and comments.
Yet, I'm left with the question:
Can we can brainstorm a caching solution that is workable...
I've seen some posts talking about some of the challenges.
1.) Only good for static data
As it was proposed that is largely true. This doesn't mean, however that the idea as stated isn't worth persuing because I think you'll find most applications have a great deal of "static enough" data to benefit greatly from this type of caching.
However, I think some simple changes to the idea may make it useful for busy tables...
These changes, would probably require direct communication between the caching controller and the the postmaster.
a.) Rather than table locking, track changes at the row level.
b.) Rather than requiring a complete reseeding of a table after an update, just invalidate, or repopulate the affected rows.
c.) Rather than destroying popular query results, try to update them if possible. For example, it's easy to remove one entry from the cache if that row was just deleted. (It's probably cheaper to rerun the query on just the few changed rows than to rerun the whole query.)
d.) Any other ideas?
2.) If any caching were possible, we'd already be doing it.
I don't think this statement will stand the test of time! Agreed, caching quickly becomes a head hurting complex topic, but the benefits are well worth the effort!
3.) Consistency and memcached, *are* mutually exclusive. Memcached provides no mechanisms for consistency.
"You can never have perfect consistency across different systems (memcache /
postgresql) and especially not when their visibility rules differ. What is
visible to something via memcache is always latest uncommitted. What is
visible in PostgreSQL is something less than that. Consistency is not
possible. Correct caching is therefore also not possible unless you define
correct as 'latest', and even then, you have problems if memcache expires
the record, before the real record has been commited into PostgreSQL."
I completely agree. I'm not talking about synchronizing memcached data to be consistent, I'm saying lets create a caching layer that works something like memcache and preserves consistency!
and, very related to this,
4.) Memcached Caching is exactly opposite to Postgres consistency.
Specifically:
Memcache is serialized
Memcache can loose data at any time
Memcache has only 2 fields
Memcache has no synchronization
Postgres needs consistency.
Memcache doesn't do any synchronization, and that means consistency is impossible. However, a special version of memcache that is embedded into the postgresql system or api that does talk with the postmaster could be able to provide guaranteed consistency?
5.) This idea won't save any time with SQL parsing.
I believe it can... Because, as memcache has illustrated, you can avoid any sql parsing by using the sql and user's table permissions (if different users are enabled) as the key to the cached data.
6.) Postgresql is consistency. If an application needs speed let the application figure out how to cache the data
I appreciate that Postgres is all about data consistency. Actually, that's why I'm here and not on the Mysql board... However, I believe that we can provide caching without losing consistency, and developers will love the extra speed.
If we do implement the caching once, everyone will be able to use it without complicating their end application!!! (Read: It will help the world and make PostgreSQL very popular!)
---
So, would it work to create a caching pre-processor for Postgresql that would work serially on every request, and pass all uncached queries to the database?
- If it's a cache hit, and the data is currently available and active, pass the data back.
- If it's a miss, pass the query along to the database, and populate the cache with the results.
- If the query changes data, invalidate the cached queries that touch any table rows that could be affected. After the update, re-enable the unaffected table rows, and repopulate the cache with the updates.
- Cached queries using an affected table would normally be deleted except in special simple cases that could be updated.
A related interesting thought... It might help if the cache stored the data in separate table rows rather than in whatever format the query requested the way memcached does.
- Each cached table row could be stored as a separate entity.
- Complicated joins, rather than caching all of the data in whatever organization the user specified, would instead store a matrix of pointers to the exact table fields in most cases.
Will it work? Am I missing anything?
Thanks
Daniel
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.