As part of the background learning and research I’ve been doing for work, I’ve been looking into various forms of spatially referenced data and ways of presenting it over the web. In particular, I’ve been developing a tool in Java that will collect geotagged tweets from Twitter over a period of time, filtered by things such as keywords, hashtags or geographic bounding boxes.
Earlier in January, I collected a few days worth of tweets filtered on keywords related to snow, whilst the UK was experiencing its first snowfall of the year. The idea was to see how effective this might be as a means of gathering data on location-specific events. Over the period of a weekend, around 20,000 snow-related, geotagged tweets were collected and plotted onto a UK map.
Immediately it was obvious that the map was pretty useless as far as representing snowfall in the UK was concerned. The hotspots around the country fell firmly around large cities and areas with large numbers of Twitter users. To produce a sensible map, the data would need to be normalised against a control sample; a random collection of tweets collected over long period that could be used to highlight and therefore eliminate the effect of Twitter hotspots around the UK.
The corrected snow map will follow shortly (edit: here it is), but for now, here’s a map based on my control sample showing Twitter population density for counties across the UK and Ireland (plus a few other regions that accidentally fell inside my bounding box). You can zoom and pan the map as normal and click on each county to get further information.