Posts Tagged ‘Xapian’
I’ve been doing some work recently to finish off the “geospatial” branch of Xapian, which will add various features allowing Xapian to be used to do some geospatial searching: for example “find my nearest”, “only show me results within N miles” and “weight by a combination of closeness and relevance”. This work is nearly ready to be merged into trunk, so it should make the 1.1.0 release, but it relies on users having latitude-longitude coordinates available for their documents and searches. Thus, it isn’t a lot of use for users who’ve only got addresses or postcodes!
So, I’ve been researching the ways of converting addresses and postcodes to latitude-longitude coordinates, with a bias towards those systems which will work in the UK. Most countries seem to have fairly freely available postcode databases (or Zip code, or whatever they want to call them), but sadly the UK doesn’t. The Royal Mail sells the postcode database, and to get a license to use it for a website you tend to need to pay a few thousand pounds. There are also several resellers who sell it more cheaply, or combine it with software to do various lookups in the database, but these aren’t much use to small or non-commercial websites.
So, it looks like freely available postcode data is the way to go for small or non-commercial websites, or websites on a budget! Besides, rolling your own is much more fun…
A year ago, there didn’t seem to be much useful data around (or at least, I couldn’t find it), but there are now several useful sources of data:
- http://www.geonames.org/ – this site (as well as providing some web services) provides downloads of freely usable (Creative Commons Attribution 3.0 License) lists of location names, together with the all-important latitude-longitude data for each. This is immensely useful data! It also provides a list of 27,000 or so “outcodes” – ie, the first half of postcodes. This allows the rough location of each postcode to be established, to “town” accuracy, which may be good enough for many applications.
- http://www.npemap.org.uk/ – this site uses scanned copies of out-of-copyright OS maps (the “New Popular Edition”, to be precise), and the freely available outcodes, to allow users to enter their postcode and point to it on an (old) map. This has been used to collect nearly 40,000 postcodes (though not all of them are full postcodes), with moderate accuracy.
- http://www.dracos.co.uk/play/locating-postboxes/ – this site started with a Freedom of Information request to get the list of all postboxes in the UK, and associated postcodes. Unfortunately, the list didn’t have latitude-longitude coordinates for each postbox, so Matthew Sommerville site allows users to mark on a map the location of postboxes that they know about, and associate them with entries on the list. This results in a list of postcodes associated with coordinates, which can be added to the list from npemap to get even more freely available postcodes.
So far, I’ve built a quick toy index of the geonames name data, using Xappy, which performs remarkably well. I plan to add a little further parsing, so that I can handle postcodes and partial postcodes as well as possible.
One problem is that, given a partially entered postcode, it’s not always possible to tell what the outcode is. For example, if a user enters “CB22″, it’s not possible to tell if that’s the outcode “CB22″, or the outcode “CB2″ followed by the first digit of the second half of the postcode.