LinkedIn has created a new search architecture called Galene, a year long effort to scale its search engine and gather all the economic data there is in the world to obtain the world’s first economic graph. Galene makes searching twice as fast and a new feature “instant results and suggestions” predicts what the user is going to type.
With more than 300 million registered users, LinkedIn believed the group of different technologies built to allow users to search for jobs, groups, and other things was getting outdated and a change was much needed.
The Pre-Galene Architecture:
LinkedIn built its early search engines on Lucene. Lucene is an open source library that supports building a search index, searching the index for matching entities, and determining the importance of these entities through relevance scores. Lucene has two components:
- Inverted Index: A mapping from search terms to the list of entities that contain them
- Forward Index: A mapping from entities to metadata about them
The pre-Galene architecture also included a few other components– Bobo, Cleo, Krati, and Norbert to address the shortcomings of Lucene. These components have also been open sourced.
Shortcomings of Pre-Galene Architecture:
- It was difficult to rebuild a complete index
- Live updates were inefficient
- Scoring in pre-Galene stack was inflexible
- The previous system failed to provide support for many search requirements like offline relevance, query rewriting, reranking, blending, and experimentation.
- It was difficult to manage small, open sourced components.
The New Galene Architecture
In the new Galene architecture, Lucene is retained as the indexing layer. Lucene assists in building indices and queries and retrieve matching entities from the index. Apart from this all other functionalities of Lucene, Sensei, Search Content Store, Zoie, Bobo, Cleo, Krati, and Norbert have been dropped.
The diagram below shows the Galene search stack
Galene in Action:
Let us look at what changes the users can expect:
- Better Access: Earlier only the first and second-degree connections could use the Instant Member Search due to shortcomings in the older architecture. Galene allows all members to perform type ahead searches.
- Better Relevance: Instant Member Search has an improved relevance algorithm. This includes offline static rank computation, personalization through factors such as connection degree, and approximate name matching.
- Faster and more efficient: The new system is twice as fast as the previous system and utilizes about a third of the hardware.