From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Billy Earney <billy(dot)earney(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: query cache |
Date: | 2012-03-23 18:08:54 |
Message-ID: | CAHyXU0ySQV+UA3jcHw_ebxumiEow7ibY890CmwZjoaObAVhY8Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 23, 2012 at 12:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The complication, opportunities for bugs, and general slowdown
>>> associated with that would outweigh any possible gain, in the opinion
>>> of most hackers who have thought about this.
>>
>> I wouldn't be quite so pessimistic. I think the problem is that the
>> hard part in doing this for real is all the parts the proposal glosses
>> over. How much memory is it worth dedicating to the cache before the
>> cost of that memory costs more than it helps? How do you invalidate
>> cache entries efficiently enough that it doesn't become a bottleneck?
>
> I think the question of how you would invalidate things is a very good one.
>
> The other thing that makes me skeptical of this proposal is that I am
> not very sure that executing absolutely identical queries is a very
> common use case for a relational database. I suppose there might be a
> few queries that run over and over again (e.g. whatever you need to
> render your home page), but I think those will be the exception, and
> not the rule. It therefore seems likely that the overhead of such a
> cache would in most cases be greater than the benefit of having it in
> the first place.
>
> What I think is more common is the repeated submission of queries that
> are *nearly* identical, but with either different parameter bindings
> or different constants. It would be nice to have some kind of cache
> that would allow us to avoid the overhead of parsing and planning
> nearly identical statements over and over again, but the trick is that
> you have to fingerprint the query to notice that's happening in the
> first place, and the fingerprinting has to cost less than what the
> cache saves you. I don't know whether that's possible, but I suspect
> it's far from easy.
Query cache basically addresses two use cases:
1) read only or mostly read only workloads
2) badly written application code (either by human or machine)
The problem is that #1 can be optimized by any number of simple
techniques, and #2 is not a good basis for complicated internal
features with nasty trade-offs. mysql's query cache woes are well
known -- it's typical for administrators to turn the feature off. The
feature is misnamed -- it's a 'benchmark cheating feature' since a lot
of db benchmarks tend to focus on single user loads and/or highly
repetitive queries but completely falls over in production real world
workloads. Also, it's really not that difficult to rig an ad-hoc
cache in the server or on the client side and you can then gear it
towards your particular use-case.
People that are asking for this probably really want materialized views instead.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2012-03-23 18:28:30 | Re: Finer Extension dependencies |
Previous Message | Greg Stark | 2012-03-23 17:51:12 | Re: query cache |