Recently, we added fine-tuning to our Rerank endpoint to improve search systems for complex domains. Complex domains present challenges for AI models due to their intricate terminology, context, and domain-specific knowledge requirements. These domains include legal documents, medical research papers, scientific literature, technical manuals, developer documentation, code, financial reports, and other fields that demand a deep understanding of specific jargon and intricate concepts. These domains often necessitate fine-tuning or pre-training on custom data to ensure the models capture the nuances and expertise essential for accurate comprehension.
Rerank is a sophisticated semantic relevance scoring and ranking system that optimizes search results by evaluating the contextual relationship between queries and passages. The new fine-tuning capability makes these outcomes even more relevant across complex domains. Adding fine-tuning to Rerank has been our top developer ask!
Comparing Generative and Rerank Models for Search Relevance
Generative models such as OpenAI’s GPT-4 can outperform traditional embedding models in high precision relevance rankings (i.e., reranking the highest relevant document from a set of relevant documents) by allowing users to add context through additional information in the prompts and preambles. However, this performance comes at the expense of cost, speed, and non-determinism. Running a large model like GPT-4 requires significant computation resources, leading to higher infrastructure and operational expenses than smaller models. GPT-4 would also need to meticulously process every document, resulting in slower computations. Moreover, the non-deterministic nature of GPT-4 adds a layer of unpredictability to its responses, posing challenges in scenarios where predictability is crucial.
When the domain is well-defined, rerankers are a better choice. Reranking is faster and more cost-effective than large generative models like GPT-4. To give an idea of the cost, let’s assume an example query (50 tokens) searches across 100 documents (450 tokens each), requiring 45050 tokens per search. At the current GPT-4 Turbo pricing of $0.01 per thousand tokens, each search would cost $0.4505. Comparatively, using Cohere’s Rerank model for the same example would cost $0.001 per search.
Rerank models also stand out due to their deterministic nature, balancing efficiency and predictability. Fine-tuning rerank models allows for domain adaptation and boosts performance beyond using generative models for ranking.
Performance Gains for Complex Domains
To understand the importance of domain-specific training, we analyzed the legal domain. The language used in legal documents such as court rulings, contracts, and statutes significantly differs from everyday dialogue.
We evaluated the performance of Rerank with fine-tuning in the legal domain with the CaseHOLD benchmark. CaseHOLD is a multiple choice Q&A task consisting of legal decisions referencing other decisions as precedents, called a holding statement. It's a challenging task that demands specialized legal expertise to solve.
In this experiment, we compared the performance difference between the Rerank model with and without fine-tuning. We fine-tuned Rerank only on the CaseHOLD training data; no other legal corpus was used. We also varied the number of fine-tuning examples used to see if there was a performance difference.
The results were that even with limited domain-specific data, Cohere's fine-tuned Rerank model essentially doubled the accuracy over the base Rerank model in that specific domain. Cohere's Rerank model offers an efficient alternative through fine-tuning, matching the performance of these models without the need for extensive computational resources.
Developers can fine-tune their Rerank endpoint programmatically through an SDK or Cohere’s new fine-tuning interface. There is no cost to train fine-tuned Rerank models. Learn more about fine-tuning for Rerank in our developer documentation.