Linux disk cache not being ideal
It’s unfair to expect Linux’s disk caching or IO to be perfect, but these traces look like it failing in a fairly easy situation to me (though there could be all sorts of other things going on – these traces were taken on a moderately busy machine):
Firstly, here’s a trace of a (5 term) xapian search, being repeated 100 times. All accesses in this trace are reads – horizontal axis is time, vertical is offset in file, green is a fast read, red is a read taking over 0.001 seconds. Mostly the 20-odd disk reads performed by the search are very fast, but there’s a single read which takes about 0.01 seconds – and looking at the trace by hand, I can see that the exact same 8k stretch of disk was read almost immediately before this read (there was 1 read intervening). So, either the cache dropped that block just before it was needed, or something else locked up the IO system for a moment:
This isn’t an isolated event, either. The next trace I ran was the same search, with checkatleast set to dbsize. For those unfamiliar with xapian, this basically means that a bit more IO is done during the search. A similar, though less severe, event happened here: this time, there were 8 intervening reads between a fast read of the block, and the slow read (visible again as a red line).
It may well be worth implementing a small disk cache inside xapian, if these events are widespread…