Neat project by some computer science folk at Cornell: develop algorithms to mine 35 million photographs on Flickr to determine which cities are the most photographed in the world, and which landmarks/objects/locations feature most prominently in photographs within those cities. Abstract:
We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analysis based on text tags and image data with structural analysis based on geospatial data. We use the spatial distribution of where people take photos to define a relational structure between the photos that are taken at popular places. We then study the interplay between this structure and the content, using classification methods for predicting such locations from visual, textual and temporal features of the photos. We find that visual and temporal features improve the ability to estimate the location of a photo, compared to using just textual features. We illustrate using these techniques to organize a large photo collection, while also revealing various interesting properties about popular cities and landmarks at a global scale.
I really enjoy projects like this that combine analytical techniques from multiple disciplines, and of course as a database and datamining professional who deals with spatially-referenced data myself (har…) I’m partial to entertaining applications thereof. More on that in a sec; let’s cut to the chase of how cities and landmarks sorted out (click it to enlarge):
So the John Hancock Center actually beats Sears Tower (yes it’s still named Sears Tower at least until the end of the month or so), which is pretty cool considering JHC is indeed the more visually elegant implementation of the Fazlur Khan structural system. Art Institute is the only local museum to place. Actually, a whole lot of things jump out: Seattle as the 8th most-photgraphed city in the world? I suppose to a large extent, these results paint a demographic profile of the average Flickr user: that is, a somewhat tech- and design-savvy American. This is a key point to mention in discussing all this, of course: that this table has substantial built-in self-selection bias based on who has a Flickr account.
I’m particularly amused by Las Vegas’s defining photo-inspirers: 5 out of the 7 directly reference locations that aren’t Las Vegas, including all of the top 4.
For some quick geek chat: in total, they mined 60 million images, narrowing it down to about 35 million (that’s 2 terabytes of data) that could be accurately geocoded (that is, assignment of each photograph to a particular precise point in space). Geolocation is performed via a combination of Flickr photo tags (”textual features”) and SIFT image processing (”visual features”). Density, or the popularity of each landmark and city, is then calculated via mean shift clustering, and consideration is also given to ensure that multiple photos by the same photographer don’t bias the results while nonetheless providing insight as to the path photographers take through cities in terms of both time and space.
Cool.
Anyway, first saw this at the news story here at physorg, which bizarrely emphasizes in the headline the potential to make online travel books as a result of this study. Um, OK. For a little more depth, check out the actual study here (PDF).
Citation
Title: Mapping the World’s Photos
Authors: David Crandall, Lars Backstrom, Daniel Huttenlocher and Jon Kleinberg
Department of Computer Science
Cornell University



