postgres files in use not staying in linux file cache

From: Brio <brianoraas(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: postgres files in use not staying in linux file cache
Date: 2014-06-05 21:32:04
Message-ID: CAM+G8pRg9-Dvfr3dam9qG-etQGNGUQEO-cDPyhwR-_HTvWQABA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, I'm trying to investigate a performance problem.

We have a large database (over 1TB) running on a server with 160GB of RAM
and 32 cores (Xeon E5-2650). The database files are on a NetApp mount.

The software is Postgres 9.3.1 on Ubuntu 12.04, Linux 3.2.0-38-generic.

Generally, when a query is slow, it's because it's waiting for I/O. Since
only about 10% of the database can be in RAM at any time, this is expected.
I'm trying to analyze the working set in the cache to see that relevant
tables and indexes are cached. I can map our database's objects to files
using pg_class.relfilenode to get the name(s), and then I use fincore from
linux-ftools to inspect the Linux cache.

I have noticed a few times that an index scan may be taking a long time,
and the query's backend process is reading from disk at about 2 MB/s,
spending 99% of its time waiting for I/O (using iotop). This makes sense,
if scanning an index that is not in cache.

As this is happening, I expect bits of the index and table to be pulled
into cache, so the index scan may speed up as it goes, or will at least
finish with some portion of the index in cache, so the scan won't be so
slow next time. I use fincore to see how much of the index and table are in
cache (taking into account that large objects >1GB will be split into
multiple files). To my surprise, the files are cached 0%!

Some research about the Linux page cache suggests that any file can be
essentially forced into cache by cat-ting to /dev/null. So I cat a file of
one of these database objects, and I can see the cat process reading at
about 100MB/s, so it takes 10 sec for a 1GB file. Then I check fincore
again -- the file is still not cached. cat-ting the file again still takes
10 sec.

I cat several times in a row, and the file refuses to cache. Sometimes I
see a bit of the file appear in the cache, but get removed a few seconds
later. I also tried the fadvise program in the ftools, which didn't help.
The I/O on this machine is not all that high (a few MB/s for various
postgres processes), and there are a few GB free (shown in top). Most of
the RAM (150GB+) is used by the page cache. No swap is in use.

Eventually the query finishes (or I cancel it). Then I find that running
cat on the file does leave it in cache! So, is having multiple readers
confusing Linux, or is Postgres doing any madvise on the file (I'd expect
no to both).

So here's where I'm stuck. How can reading a file not leave it in the Linux
cache? I'd expect it to enter the inactive list (which is about 80GB), so
I'd expect another 80GB would need to be read before it would be its turn
to be evicted.... which should take a long time if my maximum read speed is
100MB/s.

Since I don't understand the behaviour of simply running cat and the file
not being cached, I wonder if this is an issue with Linux, not Postgres.
But the issue seems to happen with files in use by Postgres, so maybe
there's an interaction, so I thought I'd start here.

Any ideas how I can debug this further? Thanks!

Brian

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vincent Lasmarias 2014-06-05 23:57:11 Re: CPU load spikes when CentOS tries to reclaim 'cached' memory
Previous Message Igor Neyman 2014-06-05 20:20:57 Re: Seqscan on big table, when an Index-Usage should be possible