AI-powered chatbots can play roles as diverse as a hotel concierge, IT customer support, and an internal finance assistant. While general language models serve a broad range of purposes, there are instances where customization and refinement become crucial. At Cohere, we recently released the ability to fine-tune our Chat endpoint for developers looking to shape their chatbots to meet specific style and knowledge requirements.
Tailoring Chatbots for Specific Roles
With fine-tuning, developers can customize a bot's behavior to enhance user engagement and satisfaction by making interactions more natural and contextually appropriate. For instance, a chatbot serving as a hotel concierge should be proficient in hospitality tasks while exhibiting a polite and helpful demeanor. An IT customer support bot needs to be technical and solution-oriented to deal with frustrated customers. Similarly, an internal finance assistant bot should be able to communicate through financial formulas when needed.
Without fine-tuning, general language models may be able to engage users to some degree. However, when fine-tuned, this enhancement ensures that the chatbot moves beyond generic language understanding, delivering valuable insights and information tailored to the user's inquiries.
Example: Chat Fine-tuning on Financial Data
To determine the quantitative impact of fine-tuning, we conducted an experiment using the Chat endpoint. The focus was on ConvFinQA, a domain-specific question-answering system for financial queries designed to evaluate the performance of models in the financial domain. The dataset comprises a large collection of questions and answers regarding trends, patterns, and interpretations of financial data.
In our experiment, we evaluated the quality of responses with fine-tuned models in areas where we expected out-of-the-box models to perform poorly. To answer financial questions, we fine-tuned chat responses to either return the requested result or present a formula on how to calculate it. We ran this experiment on Command Light, Cohere’s 6B parameter generative model.
The queries in ConvFinQA delve into specialized financial language not commonly found in everyday conversations. After applying fine-tuning with a curated dataset, the accuracy reported was at an impressive 68.1%. The high accuracy highlights the potential for tailored chatbots to excel in scenarios where generic models lack domain-specific context. From additional experiments across other domains and industries, we generally saw at least a 40% gain in accuracy.