Re: postgres files in use not staying in linux file cache

From: Brio <brianoraas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: postgres files in use not staying in linux file cache
Date: 2014-06-25 00:13:25
Message-ID: CAM+G8pT=y0CAEw-fkCuniATQW9iNCURQRY-d0gXVrd-eVBvVJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

(Sorry, our last exchange forgot to cc the pgsql-performance list.)

Yes, I did see the original problem only when postgres was also accessing
the file. But the issue is intermittent, so I can't reproduce on demand, so
I'm only reporting what I saw a small number of times, and not necessarily
(or likely) the whole story.

I've upgraded the kernel on my test machine, and I haven't seen the
original problem. But I am seeing what looks like it might be the problem
you describe, Jeff. Here's what I saw:

This machine has 64 GB of RAM. There was about 20 GB free, and the rest was
mostly file cache, mostly our large 1TB database. I ran a script that did
various reading and writing to the database, but mostly updated many rows
over and over again to new updated values. As this script ran, the cached
memory slowly dropped, and free memory increased. I now have 43 GB free!
I'd expect practically any activity to leave files in the cache, and no
significant evictions to occur until memory runs low. What actually happens
is the cache increases gradually, and then drops down in chunks. I would
think that the only file activity that would evict from cache would be
deleting files, which would only happen when dropping tables (not happening
in my test script), and also WAL file cycling, which should stay a constant
amount of memory.

But, if blocks that are written are evicted from the cache, that would
explain it, so I'd like to test that. As a very basic test, I tried:
cd /path-to-nfs-mount
echo "foo" > foo.txt
sync # This command forces a write? I haven't really used it before
linux-fincore foo
shows the file is cached 100%.

Although you don't have the Perl script you mentioned, could you give a
basic description of what it does, so I could try to recreate it? I'm not
familiar with Perl, but I've done plenty of C programming, so demonstrating
this with the actual Linux APIs would be ideal.

Thanks Jeff!

On Mon, Jun 23, 2014 at 3:56 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Jun 18, 2014 at 11:18 PM, Brio <brianoraas(at)gmail(dot)com> wrote:
> > Hi Jeff,
> >
> > That is interesting -- I hadn't thought about how a read-only index scan
> > might actually write the index.
> >
> > But, to avoid effects like that, that's why I dropped down to simply
> using
> > "cat" on the file, and I saw the same problem there, with no writing
> back.
>
> I thought that you saw the same problem with cat only when it was
> running concurrently with the index scan, and when the index scan
> stopped the problem in cat went away.
>
> > So the problem really seemed to be in Linux, not Postgres.
> >
> > But why would dirty blocks of NetApp-served files get dropped from the
> Linux
> > page cache as soon as they are written back to the NetApp? Is it a bug in
> > the NetApp driver? Isn't the driver just NFS?
>
> I don't know why it would do that, it never made much sense to me.
> But that is what the experimental evidence indicated.
>
> What I was using was NetApp on the back-end and just the plain linux
> NFS driver on the client end, and I assume the problem was on the
> client end. (Maybe you can get a custom client driver from Net-App
> designed to work specifically with their server, but if so, I didn't
> do that. For that matter, maybe just the default linux NFS driver has
> improved.)
>
> > That sounds like a serious
> > issue. Is there any online documentation of bugs like that with NetApp?
>
> Yes, it was a serious issue for one intended use. But it is was
> partially mitigated by the fact that I would probably never run an
> important production database over NFS anyway, out of corruption
> concerns. I was hoping to use it just for testing purposes, but this
> limit made it rather useless for that as well. I don't think it would
> be a NetApp specific issue and didn't approach it from that angle,
> just that NetApp didn't save from the issue.
>
> Cheers,
>
> Jeff
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Niels Kristian Schjødt 2014-06-25 08:48:28 Guidelines on best indexing strategy for varying searches on 20+ columns
Previous Message Huang, Suya 2014-06-24 07:07:37 Re: huge pgstat.stat file on PostgreSQL 8.3.24