Location-based search requires to deal with two types of data: keywords and geospatial data. For keywords, a classic index structure is inverted index, using which we can easily find documents that contain keywords. For geospatial data, there are many existing index structures, such as R-tree, R*-tree, quad-tree, and Hilbert curve. In order to answer location-based queries, we need integrate these two types of index structures to efficiently find answers that match both conditions.
As mentioned in our earlier posts, instant search and fuzzy search are also very important for location-based search, especially when users issue queries from their mobile devices. However, supporting this type of search is even more challenging, since we need to also consider how to use index structure to handle prefix condition (e.g., a prefix query "Ghira") and fuzzy condition (e.g., "Girardeli"with two typos).
Location-Based Search in Lucene
Although Lucene was initially designed and implemented to support traditional keyword search, it can be extended to support location-based search by using a technique called Geo Hash encoding. Grant Ingersoll wrote an excellent article on how to do it. Many popular search engines are taking such an approach or something similar.
One limitation of this approach is that we cannot easily extend it to enable more powerful capabilities such as instant search or fuzzy search. One metaphor is that a pair of basketball shoes can be used to play soccer, but they cannot give you a good performance, since they are not designed for that purpose!
Bimaple's Location-Based Search
At Bimaple we developed a novel technology that seamlessly integrates several index structures to enable location-based search with the powerful search capabilities. We have published a white paper to compare our solution with Lucene, and shown that ours is 10-100 times faster.