Over the past year, to help drive better search and retrieval results for our enterprise customers, we launched state-of-the-art Embed and Rerank models, along with modeling and API improvements. Integrating our newest model, Command R, transforms complex tasks like retrieval-augmented generation (RAG) with external APIs and tools into a seamless experience, simplifying deployment and enhancing accuracy like never before.

Below we answer some of the top questions from our customers who are rolling out RAG solutions and aiming to optimize their search and retrieval systems.

Why is search and retrieval so important for RAG solutions?

No matter how good your generative model is, it's only as good as the information that is found with search.

RAG improves the results of generative models by connecting them with sources of information, like proprietary datastores, and augmenting the original prompt with contextually relevant information. RAG is composed of two parts, the retrieval mechanism and the augmented-generation mechanism. For best results, you need a strong search and retrieval engine. You want to find the best information among potentially competing sources of truth with varying levels of quality that are the most contextually relevant and impactful, but you want to do it in the most efficient, fast, and responsible way. This requires careful consideration and implementation, and a solution that can scale with large volumes of data and growing usage.

What strategies should I be thinking about to help improve search results?

We recommend adding a reranker to any search system for better results.

A reranker model significantly boosts search effectiveness for businesses by refining initial search results to prioritize relevance and accuracy. It dynamically assesses and rearranges search outputs based on a deeper understanding of query intent and content context. This not only makes search experiences better for users by delivering precisely what they're looking for, but it also cuts down on time spent sifting through irrelevant results. A reranker can be a game-changer in providing superior search experiences and driving better engagement.

When should I use embeddings search versus just a reranker?

For optimal RAG performance, we recommend using a combination of embeddings search with a reranker.

We’ve already explained why adding a reranker to any search system significantly improves preliminary results. But a reranker alone has limitations. It can only process a few hundred documents at once. You will always want some kind of first-step retrieval that takes you from billions of search results to hundreds. Then a reranker can take the hundreds of results to a few high quality answers.

When building AI solutions with RAG, particularly when you are working with noisy and multilingual data sources, we recommend using semantic search with embeddings as that first-step retrieval. Embeddings search provides another layer of context and quality to your RAG pipeline that ultimately will improve the overall performance of your AI application. Embeddings also offers an easier way to implement multilingual and multi-hop search solutions at scale. Combining embeddings search with a reranker provides the best available quality search for RAG solutions.

When should I consider fine-tuning a reranker model?

Fine-tune a reranker if your solution contains intricate terminology, context, and domain-specific knowledge requirements.

Fine-tuning a language model should be considered when you want to tailor the results to the domain-specific language for your particular industry or use case. This is also the case with reranker models, where in some instances a fine-tuned rerank model can double the accuracy of a generic reranker. Industries like legal services, healthcare, biomedical, technology, and financial services demand a deep understanding of specific jargon and intricate concepts. These domains often necessitate fine-tuning or pre-training on custom data to ensure that the models capture the nuances and expertise essential for accurate comprehension. Cohere’s fine-tuning capabilities for Rerank is one of our most loved solutions by customers.

What is the best way to perform an evaluation of semantic search?

The most important metric to evaluate is how relevant the search results are to the query.

You can start with a simple, informal, human “smell test” for search systems, but it’s best to get numeric results. There are many established metrics that measure search systems, such as accuracy@n, recall@n, or nDCG. Using a labeled dataset prepared in advance, they test the search system by giving it a query, and seeing if the results are relevant or not. The easiest one is accuracy@n. Say we choose n to be 10. If we give a query to this system, how many of the top 10 results returned by the system are relevant? If 7/10, then that’s 70% accuracy for this query. You do this with a large number of queries, and that measures the system’s search quality. Just remember that the vital ingredient of search evaluations is having queries with labeled results. For more details on how to evaluate embeddings search, check out the book Hands-On Large Language Models by Cohere’s Jay Alammar, Director, Engineering Fellow NLP.