Cohere’s large language models (LLMs) have been trained to be useful in practical business applications. Using the Chat endpoint, you can leverage a pre-trained LLM to build a chatbot that performs tasks ranging from summarization to copywriting to question answering.

Depending on the use case you have in mind, you might want to amend how the chatbot generates its output. For instance, if your task uses highly technical data or you want to change the chatbot’s style of voice or output format, you often need to perform another round of training on additional data to ensure the best performance. This extra training is referred to as fine-tuning.

Fine-tuning is also recommended when you want to incorporate your company's unique knowledge base. For example, if you are aiming to use a model to draft responses to customer-support inquiries, fine-tuning on old conversations with customers can improve the quality of the output.

In this article, we’ll show you how to fine-tune a chatbot on custom conversational data to improve its performance at a specific task. We’ll use Cohere’s Python SDK for the code examples, and this article comes with a Jupyter notebook.

Contents

Fine-Tuning with Cohere
Step-by-Step Guide with the Python SDK
Conclusion

Fine-Tuning with Cohere

Cohere offers two methods of creating fine-tuned models: via the fine-tuning dashboard and the Python SDK.

The fine-tuning dashboard is a no-code option that allows you to get started quickly.

*The fine-tuning dashboard makes it easy to manage and run fine-tuning projects*

The Python SDK allows you to kick off fine-tuning jobs programmatically. This is useful if, for example, you plan to run multiple fine-tuning jobs on a regular schedule.

Step-by-Step Guide with the Python SDK

In this article, we’ll explore the steps involved in training a custom chatbot using the Python SDK for the code example. Follow along in the notebook.

Step 1: Prepare and Validate the Dataset

We will work with the CoEdIT dataset¹ of text editing examples (Raheja, et al). In each example, the user asks a writing assistant to rewrite text to suit a specific task (editing fluency, coherence, clarity, or style) and receives a response. Below, you can see some examples from the raw dataset.

{"_id": "57241", "task": "coherence", "src": "Make the text more coherent: It lasted for 60 minutes. It featured the three men taking questions from a studio audience.", "tgt": "Lasting for 60 minutes, it featured the three men taking questions from a studio audience."}

{"_id": "69028", "task": "clarity", "src": "Make the sentence clearer: URLe Lilanga (1934 27 June 2005) was a Tanzanian painter and sculptor, active from the late 1970s and until the early years of the 21st century.", "tgt": "URLe Lilanga (1934 27 June 2005) was a Tanzanian painter and sculptor, active from the late 1970s and until the early 21st century."}

We will use the src and tgt fields from each example, which correspond to the user’s prompt and the writing assistant’s response, respectively. Instead of using the full dataset, we will use a subset focused on making text coherent: 927 total conversations.

To format the dataset for the Python SDK, we create a .jsonl where each JSON object is a conversation containing a series of messages.

A System message in the beginning, acting as the preamble that guides the whole conversation
Multiple pairs of User and Chatbot messages, representing the conversation that takes place between a human user and a chatbot

For more details on best practices for formatting your dataset, check out the documentation. Here is a preview of the prepared dataset:

{'messages': 
 [{'role': 'System',
   'content': 'You are a writing assistant that helps the user write coherent text.'
  },
  {'role': 'User',
   'content': 'Make the text more coherent: It lasted for 60 minutes. It featured the three men taking questions from a studio audience.'
  },
  {'role': 'Chatbot',
   'content': 'Lasting for 60 minutes, it featured the three men taking questions from a studio audience.'
  }
 ]
}

The next step is to create a Dataset object using the create_dataset() method of the Cohere client. It takes four parameters:

name: The custom dataset name of your choice
data: Use the path to the .jsonl file you just created
eval_data (optional): Path to .jsonl file with evaluation data; if not specified, the data will be auto-split with 80% going to training and 20% to evaluation
dataset_type: Use ”chat-finetune-input” for the Chat endpoint

import cohere

# instantiate the Cohere client
co = cohere.Client("YOUR_API_KEY")  

# create a Dataset object
dataset = co.create_dataset(name="coedit_coherence",
                        	 data=open("coedit_coherence_train.jsonl", "rb"),
                        	 dataset_type="chat-finetune-input")

Whenever a Dataset object is created, the data is automatically validated against the rules for the specified dataset_type. Next, we check to see if dataset validation is successful.

print(dataset.await_validation())

The code above prints information about the Dataset object that is organized in several fields, including validation_status, validation_error, and validation_warnings.

A model that has errored during validation will return the validation_status as failed. You can review the validation_error field to learn more about the source of the error. For instance, in the sample output below, the uploaded dataset is missing the “Chatbot” speaker. To resolve this, you would need to upload a new version of the dataset where “Chatbot” appears in each conversation. For a complete set of errors that you might encounter when validating your dataset, check out the documentation.

cohere.Dataset {
	id: coedit-coherence
	name: coedit_coherence
	dataset_type: chat-finetune-input
	validation_status: failed
	created_at: …
	updated_at: …
	download_urls: …
	validation_error: 'missing Chatbot in example'
	validation_warnings: None
}

A successfully validated dataset will show the validation_status as validated. If the Dataset object passes validation, it can be used to fine-tune a model.

cohere.Dataset {
	id: coedit-coherence
	name: coedit_coherence
	dataset_type: chat-finetune-input
	validation_status: validated
	created_at: …
	updated_at: …
	download_urls: …
	validation_error: None
	validation_warnings: ['dataset has 0 valid eval rows. dataset will be auto-split']
}

Step 2: Fine-Tune the Model

We kick off a fine-tuning job using the create_custom_model() method of the Cohere client. It takes four parameters:

name: The custom model name of your choice
dataset: The Dataset object from the previous step
model_type: Use CHAT for the Chat endpoint
hyperparameters (optional): Hyperparameters to be used during training

We use HyperParametersInput() to provide custom values for the hyperparameters for training. This gives flexibility to developers who want to further optimize the custom model creation process. If not defined, the training job will use default values. You can define the following hyperparameters:

train_epochs: The maximum number of training steps
learning_rate: The initial learning rate to be used during training
train_batch_size: The batch size during training
early_stopping_patience: Stop training if the loss metric does not improve beyond the early_stopping_threshold for this many rounds of evaluation
early_stopping_threshold: How much the loss must improve to prevent early stopping

from cohere.responses.custom_model import HyperParametersInput

# define custom hyperparameters (optional)
hp = HyperParametersInput(
early_stopping_patience=6,      # default: 6
early_stopping_threshold=0.01,  # default: 0.01
train_batch_size=16,            # default: 16
train_epochs=1,                 # default: 1
learning_rate=0.01              # default: 0.01
)

# start fine-tuning using the dataset
co.create_custom_model(
	name="coedit-coherence", 
	dataset=dataset,
	model_type="CHAT", 
	hyperparameters=hp
	)

You can view the status of the model by calling the get_custom_model_by_name() method, which returns a CustomModel object.

# get the CustomModel object
ft = co.get_custom_model_by_name('coedit-coherence')
# print the status
print(ft)

A model that has finished fine-tuning will show the status as READY. The CustomModel object will also include the model ID, which you will need in the next step when evaluating the model.

Step 3: Evaluate the Fine-Tuned Model

With Test Data

Once the model has completed the fine-tuning process, it’s time to evaluate its performance.

To use the fine-tuned model, use co.chat() and set model to the corresponding model ID of the CustomModel object.

In the following code, we supply the same message to both the pre-trained and fine-tuned models for comparison.

user_message = "Make the text coherent: Pimelodella kronei is a species of three-barbeled catfish endemic to Brazil. Discovered by the German naturalist Sigismund Ernst Richard Krone, Pimelodella kronei was the first troglobitic species described in Brazil, but several others have been described later."

# Desired response: Pimelodella kronei is a species of three-barbeled catfish endemic to Brazil. Discovered by the German naturalist Sigismund Ernst Richard Krone, it was the first troglobitic fish described in Brazil, but several others have been described later.

preamble = "You are a writing assistant that helps the user write coherent text."

# Get default model response
response_pretrained=co.chat(
    	message=user_message,
    	preamble_override=preamble,
    	)

# Get fine-tuned model response
response_finetuned = co.chat(
    	message=user_message,
    	model=ft.model_id,
    	preamble_override=preamble,
    	)

print(f"Default response: {response_pretrained.text}","\n-----")
print(f"Fine-tuned response: {response_finetuned.text}")

For this example, the output appears as follows:

Default response: The three-barbeled catfish, Pimelodella kronei, is exclusive to Brazil. It was Sigismund Ernst Richard Krone, a German naturalist, who first discovered this remarkable species. Notably, P. kronei was also the initial troglobitic species to be identified in Brazil, though other such species have since been uncovered. 

Would you like to know more about the discovery of this species or its natural habitat? 
-----
Fine-tuned response: Pimelodella kronei, a species of three-barbeled catfish endemic to Brazil, was discovered by the German naturalist Sigismund Ernst Richard Krone. It was the first troglobitic species described in Brazil, but several others have been described later.

In this example, both models provide reasonable answers that are an improvement over the user’s original text. However, the fine-tuned model’s response better matches the style of the fine-tuning data in two ways. First, it is more succinct: both the fine-tuned model’s response and the desired response have 36 words, whereas the response from the pre-trained model is about 25% (10 words) longer. Secondly, the pre-trained model encourages the conversation to continue by posing a question after addressing the prompt, but the fine-tuned model simply addresses the prompt without further comment.

This is just one example, but you can compare responses to more prompts in the notebook.

In the Chat Context

We have demonstrated that the fine-tuned model can provide good answers to individual questions. But it is also a competent participant in longer, multi-turn conversations.

To see this, we will borrow from the code in How to Build a Chatbot with Cohere Chat to build a simple chat interface. The only change we need to make is to supply the model ID when using co.chat() to ensure that we are chatting with the model we just fine-tuned.

# Create a conversation ID
import uuid
conversation_id = str(uuid.uuid4())

print('Starting the chat. Type "quit" to end.\n')

while True:
    
	# User message
	message = input("User: ")
    
	# Typing "quit" ends the conversation
	if message.lower() == 'quit':
    	print("Ending chat.")
    	break
    
	# Chatbot response
	response = co.chat(message=message,
                   	model=ft.model_id,  
                   	stream=True,
                   	conversation_id=conversation_id,
                   	return_chat_history=True)
    
	print("Chatbot: ", end='')

	for event in response:
    	if event.event_type == "text-generation":
        	print(event.text, end='')
	print("\n","-"*50)

We can use the code above to have a conversation with the fine-tuned model.

Starting the chat. Type "quit" to end.

User: Hello
Chatbot: Hello, how can I help you today?
--------------------------------------------------

User: I'm fine.  Can I ask you for help with some tasks?
Chatbot: I am happy to help you with any tasks that you need help with.  I can also provide information about any topic that you would like to know more about.
--------------------------------------------------

User: Make this more coherent: Manuel now has to decide-will he let his best friend be happy with her Prince Charming. Or will he fight for the love that has kept him alive for the last 16 years?
Chatbot: Manuel now has to decide-will he let his best friend be happy with her Prince Charming, or will he fight for the love that has kept him alive for the last 16 years?
--------------------------------------------------

User: Help me with this one - She left Benaras. Conditions back home were bad.
Chatbot: She left Benaras because conditions back home were bad.
--------------------------------------------------

User: What's a good time to visit London
Chatbot: A good time to visit London is in the spring or fall, when the weather is mild and the city is not too crowded.  The best time to visit London depends on what you want to see and do.  If you want to see the sights and do some shopping, a good time to visit London is in the spring, when the weather is mild and the city is not too crowded.  If you want to see the sights and do some sightseeing, a good time to visit London is in the fall, when the weather is mild and the city is not too crowded.
--------------------------------------------------

User: Could you help with this please: Make the text coherent: Critically the album has not been as well received as other Browne recordings. It remains his only album to date to reach number 1 on the Billboard chart. 
Chatbot: Critically the album has not been as well received as other Browne recordings, but it remains his only album to date to reach number 1 on the Billboard chart.
--------------------------------------------------

User: quit
Ending chat.

Note the fine-tuned model is still able to respond to prompts like “Hello”, “I’m fine. Can I ask you for help with some tasks?” and “What’s a good time to visit London” instead of strictly following the fine-tuning objective of editing text.

The model also did a good job with context switching; it can hold a conversation when the user switches from friendly greetings, to a request for writing help, to travel planning, and finally back to writing assistance. It can also infer when the user is asking for help with making a text coherent, even if it is not explicitly stated (e.g., “Help me with this one”) or if the request is buried slightly (e.g., with “Could you help me with this please”).

If you’re interested in further improving model performance, explore refining data quality or iterating on hyperparameters as described in the documentation.

Conclusion

In this article, you learned how to fine-tune a model for the Chat endpoint on a custom dataset. You saw how fine-tuning allows you to tailor a chatbot to a specific use case and give it a particular style. As described in the documentation, you learned how to prepare the fine-tuning data, start a fine-tuning job, and understand the results.

Get started by creating a Cohere account now.

[1] Vipul Raheja, Dhruv Kumar, Ryan Koo, Dongyeop Kang, CoEdIT: Text Editing by Task-Specific Instruction Tuning, May 2023, arXiv:2305.098, Cornell University.

The Developer's Guide to Fine-Tuning Cohere Chat