We are excited to announce that our Chat API with RAG is now available in a public beta. With this new capability, developers can integrate user inputs, data sources, and model generations to build powerful product experiences and mitigate hallucinations by producing grounded and verifiable generations. The API is powered by Command, Cohere’s flagship generative large language model.
To bring this to life, we are also opening access to the Coral Showcase, a demo environment that previews how these capabilities can be made available.
As developers, we understand the significance of delivering smooth and natural interactions between our technology and users. With the Chat API, you now have the power to integrate Command's chat capability directly into your applications, enabling a seamless and dynamic conversational experience. Whether you're building a knowledge assistant or customer support system, the Chat API makes creating reliable conversational AI products simpler. As with other Cohere endpoints, developers have substantial control over the output, such as choosing the model, adjusting the temperature, and utilizing the chat history.
Cohere's Generate, Summarize, and now Chat APIs provide developers with multiple options to create generative AI products and features. The Generate API is the choice for content creation; the Summarize API is for condensing and pulling insights out of longer blocks of text; and, now, the Chat API is the choice for developing more complex conversational capabilities with RAG built-in.
Using RAG and Citations to Build User Trust
Generative models are at the forefront of revolutionizing the capabilities of AI. However, these models have their limitations. Trust is a primary concern, as AI-generated content can sometimes be unreliable or biased. Issues like hallucinations, where the AI may generate plausible-sounding but incorrect information, pose challenges for real-world and business use. Additionally, access to specific and current data can be limited, impacting the model's ability to provide up-to-date information.
Retrieval-augmented generation, or RAG, can help generative AI models build trust with users. RAG systems improve the relevance and accuracy of generative AI responses by incorporating information from data sources that were not part of pre-trained models.
The Chat API is RAG-enabled, meaning developers can inform model generations with information from external data sources. This represents a leap forward for generative AI accuracy, verifiability, and timeliness. For our public beta, developers can connect Command to:
- Web search
- Plain text documents
Model responses can be generated from information retrieved from the web or your documents, making those responses more relevant, accurate, up-to-date, and verifiable. For example, a developer building a market research assistant can equip their chatbot with a web search to access the latest news about trends and competitors in their space. By enabling RAG directly in the Chat API, it's easier than ever before to build products with this capability.
To help users understand the basis of generated chat responses, developers can configure the API to include citations from the data sources used. Citations provide a critical benefit by delivering the generated content with verifiable references, enhancing the credibility and trustworthiness of the presented information, and allowing users to explore responses for a deeper understanding.
We train Command specifically to perform well on RAG tasks. This means you can expect high levels of performance from Cohere's model. As part of optimizing for accuracy on RAG tasks, we've focused on each step of the process, including query generation, search, reranking relevant results, and generation with citations.
By leveraging a retrieval mechanism, you can connect an AI model like Command to information from public sources like the web or private ones like documents, knowledge bases, or enterprise systems. This helps AI models generate content that is more contextual. It's crucial for use cases like content generation, knowledge assistants, and customer support.
By leveraging up-to-date information from retrievable sources, RAG enables models to provide real-time insights and updates, making them valuable in applications where current information matters. It also eliminates the need for frequent model retraining. Traditional generative models often require extensive retraining to stay up-to-date with evolving information, which can be time-consuming and resource-intensive.
Right now, enterprises are struggling with inaccurate generations and model hallucinations. By connecting your AI model with relevant data sources, you enable the model to provide generations that are more likely to be accurate.
The Chat API with RAG can be used as an end-to-end system or as modular components. For developers interested in using the modular components, there are three modes available:
- Document mode: This mode specifies the documents you want the model to use. This is useful when you know that specific documents have the information that you need.
- Query-generation mode: This is for the scenarios where you’d rather get the search queries the model recommends instead of the actual replies. In these cases, the model-generated query will be the output.
- Connector mode: This mode specifies the location of where the model should look for information. We are releasing this capability with a native web search connector available for use.
Ultimately, we will provide a broader connector ecosystem, making it easy to connect to enterprise data sources. For this public beta launch, we will enable developers to configure their connectors.
More information can be found in developer documentation.
Developers can access the Chat public beta now with an API key included with a Cohere account. For those that want to see the Chat API with RAG in action without any coding, these capabilities are displayed on the Coral Showcase.