Searching with Xapian

The Xapian search engine, and associated topics

Linux disk cache not being ideal

with 2 comments

It’s unfair to expect Linux’s disk caching or IO to be perfect, but these traces look like it failing in a fairly easy situation to me (though there could be all sorts of other things going on – these traces were taken on a moderately busy machine):

Firstly, here’s a trace of a (5 term) xapian search, being repeated 100 times. All accesses in this trace are reads – horizontal axis is time, vertical is offset in file, green is a fast read, red is a read taking over 0.001 seconds. Mostly the 20-odd disk reads performed by the search are very fast, but there’s a single read which takes about 0.01 seconds – and looking at the trace by hand, I can see that the exact same 8k stretch of disk was read almost immediately before this read (there was 1 read intervening). So, either the cache dropped that block just before it was needed, or something else locked up the IO system for a moment:

100_searches_top_10

This isn’t an isolated event, either. The next trace I ran was the same search, with checkatleast set to dbsize. For those unfamiliar with xapian, this basically means that a bit more IO is done during the search. A similar, though less severe, event happened here: this time, there were 8 intervening reads between a fast read of the block, and the slow read (visible again as a red line).

100_searches_check_all

It may well be worth implementing a small disk cache inside xapian, if these events are widespread…

Advertisements

Written by richardboulton

February 10, 2009 at 4:22 pm

Posted in Uncategorized

2 Responses

Subscribe to comments with RSS.

  1. Was this on atreus? Because disk usage there is /insane/. I can give you an old dual processor server with a few SCA drives to play with if you want šŸ™‚

    James Aylett

    February 25, 2009 at 7:32 pm

  2. This was on a Lemur server, which is probably slightly less insane than on atreus, but not terrific.

    I’d be interested in the old server in theory – in practice, my “urgent todo list” is over a month long at present…

    richardboulton

    February 25, 2009 at 9:37 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: