From: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: mosbench revisited |
Date: | 2011-08-06 18:16:13 |
Message-ID: | m2bow2jthu.fsf@2ndQuadrant.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It would be nice if the Linux guys would fix this problem for us, but
> I'm not sure whether they will. For those who may be curious, the
> problem is in generic_file_llseek() in fs/read-write.c. On a platform
> with 8-byte atomic reads, it seems like it ought to be very possible
> to read inode->i_size without taking a spinlock. A little Googling
> around suggests that some patches along these lines have been proposed
> and - for reasons that I don't fully understand - rejected. That now
> seems unfortunate. Barring a kernel-level fix, we could try to
> implement our own cache to work around this problem. However, any
> such cache would need to be darn cheap to check and update (since we
> can't assume that relation extension is an infrequent event) and must
> somehow having the same sort of mutex contention that's killing the
> kernel in this workload.
What about making the relation extension much less frequent? It's been
talked about before here, that instead of extending 8kB at a time we
could (should) extend by much larger chunks. I would go as far as
preallocating the whole next segment (1GB) (in the background) as soon
as the current is more than half full, or such a policy.
Then you have the problem that you can't really use lseek() anymore to
guess'timate a relation size, but Tom said in this thread that the
planner certainly doesn't need something that accurate. Maybe the
reltuples would do? If not, it could be that some adapting of its
accuracy could be done?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2011-08-06 18:29:56 | Re: Transient plans versus the SPI API |
Previous Message | Jeff Janes | 2011-08-06 18:00:35 | Re: mosbench revisited |