Say Goodbye to Irrelevant Search Results: Cohere Rerank Is Here
Searching for information using traditional keyword-based search systems can be frustrating. You type in a phrase and get back a list of results that has little to do with what you are looking for. It's like trying to find a needle in a haystack.
In contrast, a semantic-based search system can contextualize the meaning of a user's query beyond keyword relevance, allowing it to return more relevant and accurate results.
But a complete migration to semantic-based search using embeddings is challenging for many companies. Their keyword-based search system has been in place for a long time, and it is often an important part of the company’s information architecture. Migrating to a vector database that supports embedding-based search is, in many cases, just not feasible.
The Cohere Rerank endpoint is designed to bridge this gap. And what’s more, Rerank delivers much higher quality results than embedding-based search, and it requires only a single line of code change in your application.
Introducing the Cohere Rerank Endpoint
We are excited to announce the availability of our Rerank endpoint, which acts as the last stage of a search flow to provide a ranking of relevant documents per a user’s query. This means that companies can retain an existing keyword-based (also called “lexical”) or semantic search system for the first-stage retrieval and integrate the Rerank endpoint in the second stage re-ranking.
This endpoint is powered by our large language model that computes a score for the relevance of the query with each of the initial search results. Compared to embedding-based semantic search, it yields better search results — especially for complex and domain-specific queries.
When using with a keyword-based search engine, such as Elasticsearch, OpenSearch, or Solr, the Rerank endpoint can be added to the end of an existing search workflow and will allow users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure. This is an easy and low-complexity method of improving search results by introducing semantic search technology into a user’s stack with a single line of code.
Boosting Search Quality for 100+ Languages with a Single Line of Code
Adding Rerank to your search stack is easy. Once you retrieve the initial results from your existing search engine, pass the initial query and list of results into the endpoint like so:
results = co.rerank(query=query, documents=documents, top_n=3, model="rerank-multilingual-v2.0")
Here are what the arguments represent:
query
: the user query textdocuments
: list of candidate results you want to reranktop_n
: the number of reranked results to return
Here is a quick example. The following are six passages taken from the Simple English Wikipedia. Given a query, "What is the capital of the United States?", we want to retrieve the most relevant passage to that query.
import cohere
# Get your cohere API key on: www.cohere.com
co = cohere.Client("{apiKey}")
# Example query and passages
query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. ",
"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",
"North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck."
]
results = co.rerank(query=query, documents=documents, top_n=3, model="rerank-multilingual-v2.0")
The Rerank endpoint computes a relevance score for the query and each document, and returns a sorted list from the most to the least relevant document.
The model works for 100+ languages and enables great search quality across languages.
Combining Rerank with First Stage Retrieval
Computing the relevance score for a query and potentially millions of documents would be prohibitively slow. Hence, in most cases, you want to combine this with a first stage retrieval system that does pre-filtering to give you the top documents (e.g. 100 documents) to work with. Here you can either use lexical search (e.g., with Elasticsearch, OpenSearch, Solr, etc.) or embedding-based semantic search.
From the first stage retrieval system, you then pass the top 100 results to Rerank to return a final sorted list.
End-to-End Example: Improving Lexical Search with Rerank
The following is an end-to-end example of how we can improve lexical search by adding Rerank to the workflow. Our example will search Simple English Wikipedia, which consists of about 500,000 paragraphs. Passing all of these documents to Rerank for each query would be too slow, hence we use lexical search with Elasticsearch as our first-stage retrieval system to find the top 100 results for a given query.
If you already have a search system in place, you can skip the setup of Elasticsearch and directly apply co.rerank() to initial results from your search system.
First, we install the necessary requirements:
pip install cohere datasets elasticsearch==8.6.2
And start a local Elasticsearch container using Docker:
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:8.6.2
The following script indexes Simple English Wikipedia and compares lexical search with the results after reranking:
from elasticsearch import Elasticsearch, helpers
import cohere
from datasets import load_dataset
# Get your cohere API key on: www.cohere.com
co = cohere.Client("{apiKey}")
# Connect to elastic
es = Elasticsearch("http://localhost:9200")
# If the ES index does not exist yet, load simple English Wikipedia dataset and index it
index = "wikipedia"
if not es.indices.exists(index=index):
print("Load dataset")
data = load_dataset(f"Cohere/wikipedia-22-12", "simple", split='train', streaming=True)
all_docs = map(lambda row : {"_index": index, "_id": row['id'], "_source": {"text": row['text']}}, data)
print("Start index docs. This might take few minutes.")
helpers.bulk(es, all_docs)
# Traditional lexical search with ES
query = "Cats lifespan"
# Retrieve top-100 documents from ES lexical search
resp = es.search(index=index, size=100, query={'query_string': {'query': query}})
docs = [hit['_source']['text'] for hit in resp['hits']['hits']]
print("Elasticsearch Lexical Search results:")
for doc in docs[0:3]:
print(doc)
print("-----")
# Re-Rank them with cohere
rerank_hits = co.rerank(query=query, documents=docs, top_n=3, model='rerank-multilingual-v2.0')
print("\n===========")
print("ReRank results:")
for hit in rerank_hits:
print(docs[hit.index])
print("-----")
For the given query, “Cats lifespan,” lexical search finds the following top three hits (highlighting by us):
- The average lifespan of the eclectus parrots in captivity is not known, because these birds are not kept in captivity in big numbers until the 1980s. Some sources say that the lifespan is 30 years. The longest lifespan officially recorded is 28.5 years, but a life of 40.8 years has also been reported.
- Hypergiants are very hard to find and they have a short lifespan because of their size. While the Sun has a lifespan of around 10 billion years, hypergiants will only exist for a few million years.
- The average rated life of a CFL is 8 to 15 times longer that of incandescents. CFLs typically have a rated lifespan of 6,000 to 15,000 hours, whereas incandescent lamps are usually manufactured to have a lifespan of 750 hours or 1,000 hours.
The first result talks about lifespan from parrots, the second about hypergiants stars, and the third about compact fluorescent light bulbs (CFLs), but sadly none talks about the lifespan for cats.
With lexical search, the typical challenge is that the relevant results are not displayed at the top of the search results list, but somewhere among the top 100 results. And as users are not willing to skim through hundreds of results, Rerank is here to help. Adding the following line of code dramatically improved the search quality for our query:
rerank_hits = co.rerank(query=query, documents=docs,
model='rerank-multilingual-v2.0', top_n=3)
The top three results now all provide relevant information about the lifespan of cats (highlighting by us).
- Reliable information on the lifespans of house cats is hard to find. However, research has been done to get an estimate (an educated guess) on how long cats usually live. Cats usually live for 13 to 20 years. Sometimes cats can live for 22 to 30 years but there are claims of cats dying at ages of more than 30 years old.
- The "Guinness World Record" for the oldest cat was for a cat named Creme Puff, who was 38 years old. Female cats seem to live longer than male cats. Neutered cats live longer than cats that have not been neutered. Mixed breed cats also appear to live longer than purebred cats. Researchers have also found that cats that weigh more have shorter lifespans.
- A Munchkin cat are loving and friendly. These cats want to be around humans. They love hugs and love to be pet. Munchkin cats get along with other cats Munchkins get along with dogs. These cats make great indoor cats and can hunt mice. Munchkin cats live 12 to 14 years and come in all types of colors and patterns. Munchkin cat eyes come in any color.
Evaluation — Strong Improvements on Search Quality
How well does Rerank work, and how does it compare to embedding-based semantic search, which often requires migration to a vector database?
To answer this, we performed an evaluation on three diverse datasets:
- MIRACL is the most recent search evaluation dataset created by University of Waterloo from October 2022. It contains 700k relevance judgments for 77k queries in 18 languages. Not all languages are supported by Elasticsearch, hence we limited our evaluation to 15 languages: Arabic, Bengali, Chinese, English, Finnish, French, German, Hindi, Indonesian, Japanese, Korean, Persian, Russian, Spanish, and Thai.
- TREC-Deep Learning is an annual contest organized by NIST with web search queries on a heterogeneous web corpus. We combined the datasets from 2019 and 2020, which evaluates 97 queries on a collection of 8.8M documents.
- Natural Questions was created by Google and contains actual user questions on Google Search, annotated with relevant paragraphs from Wikipedia. The test set contains 3452 questions, over a collection of 2.7M documents.
As an evaluation metric, we used Accuracy@3: the number of search queries that have a (highly) relevant search result among the top three search results. We found that this metric correlates well with the perceived search quality from users while still being easy to interpret.
As the results show, lexical search can find relevant search results, on average, for about 44% of the search queries. Hence, for the majority of search queries, the user doesn’t get any relevant information in the top three results. Embedding-based semantic search can boost this to 65%, while Rerank achieves the best performance: for about 72% of search queries, we were able to find and show the most relevant hit among the first three results.
Note: Rerank can also be used with embedding-based semantic search, which will yield even better results.
Start Building with Rerank
To recap, here are the key benefits of Rerank:
- Practical Approach: Augment existing search systems instead of replacing them.
- Performance: Achieve state-of-the-art performance in search, and improve existing embedding-based search systems, especially for complex, domain specific queries.
- Simplicity: Add a single line of code to implement the capability, with all the complexities abstracted away.
- Federated Search: Merge and sort results from different search systems
If you are a developer looking to integrate Rerank into your application or workflow, check out our API documentation, which provides detailed instruction and code samples, or visit our Rerank product page. If you are looking for inspiration on applications built on top of this endpoint, take a look at this Wikipedia Search demo project.
It is important to note that Rerank is not a replacement for a search engine, but rather it is a supplementary tool for sorting search results in the most effective way possible for the user.
However, if you are considering Rerank as a single stage search or ranking engine, that is still possible for cases where the number of documents is small. The endpoint supports reranking of up to 1,000 documents, so it can be deployed on knowledge bases within this size.
Give Us Feedback on Rerank
Our Rerank endpoint is still in beta and we welcome feedback from our users to enhance and refine its capabilities. Whether it's a suggestion for a new feature, or an issue you've encountered, we value your input and want to make sure we're meeting your needs.
So, don't hesitate to share your feedback with us through our Discord community or by clicking on the chat bubble icon located in the bottom left-hand corner of the Cohere Playground when you log in. Alternatively, you can reach us at support@cohere.com.