Search

Invalidating the Linux Buffer Cache

0 views

When you write data, it doesn't necessarily get written to disk right then. The kernel maintains caches of many things, and disk data is something where a lot of work is done to keep everything fast and efficient. That's great for performance, but sometimes you want to know that data really has gotten to the disk drive. This could be because you want to test the performance of the drive, but could also be when you suspect a drive is malfunctioning: if you just write and read back, you'll be reading from cache, not from actual disk platters. So how can you be sure you are reading data from the disk? The answer actually gets a little complicated, particularly if you are testing for integrity, so bear with me. Obviously the first thing you need to do is get the data in the cache sent on its way to the disk. That's "sync", which tells the kernel that you want the data written. But that doesn't mean that a subsequent read comes from disk: if the requested data is still in cache, that's where it will be fetched from. It also doesn't necessarily mean that the kernel actually has sent the data along to the disk controller: a "sync" is a request, not a command that says "stop everything else you are doing and write your whole buffer cache to disk right now!". No, "sync" just means that the cache will be written, as and when the kernel has time to do so. Traditonally, the only way to be sure you were not reading back from the cache was to overwrite the cache with other data. That required two things: knowing how big the cache is at this moment, and having unrelated data of sufficient size to overwrite with. On older Unixes with fixed sized buffer caches, the first part was easy enough, and since memory was often expensive and in shorter supply than it is now, the cache wasn't apt to be all that large anyway. That's changed radically: modern systems allocate cache memory dynamically and while the total cache is still small compared to disk drives, it can now be gigabytes of data that you need to overwrite. Well, that's not always so hard: for a large filesystem and relatively small memory, a simple "ls -lR" might be enough. If not, a "dd" redirected to /dev/null can fill it up. Just make sure that you are looking at different disk blocks than what you first wrote. Note that you really didn't even need the "sync" if this is what you are doing: the overwrite forces the sync itself. Modern Linux kernels make this a bit easier: in /proc/sys/vm/ you'll find "drop_caches". You simply echo a number to that to free caches. From However, if testing for integrity, and perhaps even if doing serious performance testing, this isn't enough: disk drives almost always do their own caching. If we really need to be certain that our reads came directly from the platters and not from ram on the controller, we still need to go back to the idea of knowing how big that cache is and writing enough data to force it to be flushed. So, we are still going to do "dd"'s or "ls -lR"'s or something like that. If you are examining integrity and suspect corruption, keep in mind that aging can affect your results: you might need data to sit in cache (kernel or disk hardware) for some period before the problem occurs. Quick overwrites might mask it. Tracking down this kind of problem can be very difficult. See also cache data corruption By the way, if your aim is simply to bypass cache buffering, you can do that: Raw Disk I/O is what you want. See http://aplawrence.com/Bofcusm/2658.html also. And (as some databases do) you could simply write data to a raw partition (no filesystem). *Originally published at

Found an error or have a suggestion? Let us know and we'll review it.

Share this article

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!