Saturday, March 31, 2012

Location-Based Search

An increasing number of applications and services are location based, such as Yelp, Foursquare, and Foodspotting. In addition, mobile searches tend to have a location component, since users often use their mobile phone to search for information based on where they are. All these trends make location-based search more important. Formally speaking, a location-based search query specifies a spatial condition (such as downtown San Francisco) and a few keywords (such as "Ghirardelli"), and wants to retrieve documents or objects that satisfy both conditions (e.g., finding all stores mentioning Ghirardelli in downtown San Francisco.

Technical Challenges

Location-based search requires to deal with two types of data: keywords and geospatial data. For keywords, a classic index structure is inverted index, using which we can easily find documents that contain keywords. For geospatial data, there are many existing index structures, such as R-tree, R*-tree, quad-tree, and Hilbert curve. In order to answer location-based queries, we need integrate these two types of index structures to efficiently find answers that match both conditions.

As mentioned in our earlier posts, instant search and fuzzy search are also very important for location-based search, especially when users issue queries from their mobile devices. However, supporting this type of search is even more challenging, since we need to also consider how to use index structure to handle prefix condition (e.g., a prefix query "Ghira") and fuzzy condition (e.g., "Girardeli"with two typos).

Location-Based Search in Lucene

Although Lucene was initially designed and implemented to support traditional keyword search, it can be extended to support location-based search by using a technique called Geo Hash encoding. Grant Ingersoll wrote an excellent article on how to do it. Many popular search engines are taking such an approach or something similar.

One limitation of this approach is that we cannot easily extend it to enable more powerful capabilities such as instant search or fuzzy search. One metaphor is that a pair of basketball shoes can be used to play soccer, but they cannot give you a good performance, since they are not designed for that purpose!

Bimaple's Location-Based Search

At Bimaple we developed a novel technology that seamlessly integrates several index structures to enable location-based search with the powerful search capabilities. We have published a white paper to compare our solution with Lucene, and shown that ours is 10-100 times faster.

No comments:

Post a Comment