Re: avoiding file system caching of a table

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Gabriel E(dot) Sánchez Martínez <gabrielesanchez(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: avoiding file system caching of a table
Date: 2014-02-18 01:45:23
Message-ID: CAMkU=1wH3UwnMM+AzEOVT3nx7Goba7d4yG0KB3jWX=XD-TRz6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Feb 17, 2014 at 5:09 PM, "Gabriel E. Sánchez Martínez" <
gabrielesanchez(at)gmail(dot)com> wrote:

>
> On 02/16/2014 10:33 PM, Gabriel Sánchez Martínez wrote:
>
>> Is there a way of asking PostgreSQL to read the files of a table directly
>> off the disk, asking the OS not to use the file cache? I am running
>> PostgreSQL 9.1 on Ubuntu Server 64-bit. The server in question has the
>> maximum amount of RAM it supports, but the database has grown much larger.
>> Most of the time it doesn't matter, because only specific tables or parts
>> of indexed tables are queried, and all of that fits in the file cache. But
>> we have a new requirement of queries to a table several times larger than
>> the total RAM, and the database has slowed down considerably for the other
>> queries.
>>
>> I am assuming that with every query to the large table, the OS caches the
>> files containing the table's data, and since the table is larger than total
>> RAM, all the old caches are cleared. The caches that were useful for other
>> smaller tables are lost, and the new caches of the large table are useless
>> because on the next query caching will start again from the first files of
>> the table. Please point out if there is a problem with this assumption.
>> Note that I am refering to OS file caching and not PostgreSQL caching.
>>
>
If you told postgresql to tell the kernel not to cache the data it reads,
how would this help? The data you want in cache would no longer be pushed
out of the cache, but that is because it would no longer be there in the
first place. You would have to make this instruction to the kernel be
selective. It would only tell it not to cache when it is doing a very
large query. It might be theoretically possible to do this, but it it
would probably cause more harm than good to most people most of the time.

>
>> Is there a way around this? I have read that there is a way of asking
>> the OS not to cache a file when the file is opened. Is there a way of
>> telling PostgreSQL to use this option when reading files that belong a
>> specific table?
>>
>> What about putting the table on a tablespace that is on a different
>> device partition with the sync mount option? Would that help?
>>
>
> I have read forum postings saying that the sync option affects writes, and
> will not prevent reads from caching. At some forum posting I came across
> nocache, a utility for linux. It is used by typing "nocache <command>" in
> a shell. But I can't do that with a postgres process when a connection
> opens because postgres is the one opening the process.

You would have to start the entire service with that utility, then. Which
again would defeat the purpose.

> Does someone know a work-around, or a different solution to the problem?
> Shouldn't PostgreSQL be smart about this and based on the statistics
> collected for a table and on the query plan know the harm that will be done
> if all of a very large table's pages are read and flush the cache?

PostgreSQL does know this. It has a special ring "buffer access strategy"
that it uses to prevent a large sequential scan from pushing all of the
other data out of its shared_buffers. It sounds like it is the kernel
which is failing to employ similar logic on the file cache which the
*kernel* manages.

The kernel does also have some logic to prevent this, but it may or may not
be very effective in your case (you haven't us what version of the kernel
you are using). In fact one effort of the kernel to fix this problem for
cases like yours ended up making it worse for other conditions, i.e. when
the file being read sequentially was less than available RAM but greater
than 1/2 available RAM.

You could try increasing shared_buffers (you haven't told us what it is set
to now) until it takes up a big chunk of RAM, so that PostgreSQL manages
more of the cache and the kernel manages less of it. Setting it like that
has been reported to cause problems on write-heavy work loads, but I
haven't heard of problems on read-mostly workloads.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-02-18 02:14:35 Re: Deleted files still open long after droping a database
Previous Message Kevin Grittner 2014-02-18 01:36:01 Re: Deleted files still open long after droping a database