Context by Cohere
How to Build a RAG-Powered Chatbot with Connectors

How to Build a RAG-Powered Chatbot with Connectors

Part 4 of the LLM University module on Chat with Retrieval-Augmented Generation.


In the previous chapter of the Chat with Retrieval-Augmented Generation (RAG) module, we learned how to build a RAG-powered chatbot with the Chat endpoint. In particular, we built the chatbot using the endpoint’s document mode. Document mode provides developers with the flexibility to customize each component of a RAG stack.

There is another way to build RAG systems with the Chat endpoint, which is through the connector mode. Connector mode simplifies the development of RAG systems by abstracting away some of the complexities, as you’ll learn in this chapter.

The connector mode is part of a bigger concept of connectors. Connectors are independent REST APIs that can be used in a RAG workflow to provide secure, real-time access to private data.

In enterprises, data lives in many different places. The ability of enterprises to realize the full value of RAG rests on their ability to bring these data sources together. Cohere’s build-your-own connectors framework enables developers to develop a connector to any datastore that offers an accompanying search API.

We’ll explore connectors over two chapters:

  • In this chapter, you’ll learn how to use the endpoint in connector mode. We’ll convert what we built in the previous chapter (in document mode) into a connector. The goal of this chapter is to build an intuition of how connectors work by creating one from scratch.
  • In the next chapter, you’ll learn about using quickstart connectors – over 80 pre-built connectors that you can use to connect to popular enterprise datastores. You’ll also learn how to use multiple connectors.


What We'll Build

As a prerequisite, we highly recommend reading the previous chapter to get the context of what we have implemented, which becomes the basis of what we’ll create in this chapter. We also recommend pre-reading the connector documentation to get an overview of how it works.

Let’s now look at the high-level implementation plan of our demo project. In the previous chapter, we built a RAG system comprised of three main components: documents, chatbot, and app, as shown in the following system diagram. In this chapter, we’ll turn the documents component into a connector. 

Recall that in document mode, we implemented the following steps.

For each user-chatbot interaction in document mode:

  • Step 1: Get the user message
  • Step 2: Call the Chat endpoint in query-generation mode
  • If at least one query is generated:
    • Step 3: Retrieve and rerank relevant documents
    • Step 4: Call the Chat endpoint in document mode to generate a grounded response with citations
  • If no query is generated:
    • Step 4: Call the Chat endpoint in normal mode to generate a direct response
Chat with RAG in document mode
Chat with RAG in document mode

In connector mode, most of the implementation is taken care of by the endpoint, including deciding whether to retrieve information, generating queries, retrieving documents, chunking and reranking documents (post-retrieval), and generating the response. This simplifies our implementation to the following two steps.

For each user-chatbot interaction in connector mode:

  • Step 1: Get the user message
  • Step 2: Call the Chat endpoint in connector mode to generate a response (this can be either a grounded response with citations or a direct response)

Note: The post-retrieval chunking and reranking steps are crucial because we want to ensure that not only relevant information, but also the right amount of it goes to the prompt. In this chapter’s example, these are handled by our application. In the next chapter’s example, we’ll see the alternative scenario of having the endpoint handle them.

Chat with RAG in connector mode
Chat with RAG in connector mode

Step-by-Step Guide

Let’s now go through the steps of building a RAG-powered chatbot in connector mode.

We’ll use Cohere’s Python SDK for the code examples. Here are some additional references:

Create a Connector

First, we create a connector. This includes converting the existing components into an API, adding authentication, deploying the API, and registering the API as a connector with Cohere.

To help developers build their own connectors, the quickstart connectors repository contains a connector template. The repository contains over 80 pre-built connectors to connect to popular enterprise datastores. You’ll learn how to use one of them (Google Drive) in the next chapter, but in this chapter, we’ll build our own using the connector template.


Clone the repository and go to the _template_directory. Copy it out to a fresh working directory so we can start updating it.

We’ll rename the template to demo (it can be any name you want). This part is important because we’ll use the same name to create our connector API key, which the application will check for during authentication.

Next, we’ll add the packages we need to the project: cohere, hnswlib, and unstructured.

$ poetry add cohere hnswlib unstructured

Create API Endpoints

Now we come to the core part of the connector, which is creating the API endpoints. 

The template uses Flask as the API framework and Poetry as the packaging and dependency management. The base implementation is done in the provider directory, which contains two files:

  • Responsible for setting up the Flask application. It loads environment variables, creates a Flask app instance, and configures the app with the API endpoints.
  • Defines the logic for the API endpoints, which we’ll create next.
Creating an API for the connector
Creating an API for the connector

For Processing Documents

The connector template (in comes with a search endpoint, which is used for retrieving documents. We’ll set up that endpoint next, but we need a way to process these documents in the first place.

For this, let’s create a new endpoint called process.

First, we take out the documents component from the previous chapter in its entirety and store it in a module that we’ll call This includes all the methods: load , embed , index , and retrieve .

class Documents:
    def load():
    def embed():
    def index():
    def retrieve():

Next, we create a process endpoint in that will use the module we just created. The endpoint takes a list of data sources in the form of web URLs, processes them as documents, and stores them for retrieval purposes later.


from provider.documents import Documents


demo_store = {}

def process(body):
    demo_store["docs"] = Documents(body["sources"])

    return {"message": "Documents processed successfully"}, 200

For Searching Documents

As mentioned, the connector template (in already comes with a search endpoint. The endpoint takes a list of queries and performs document retrieval based on these queries. This will be the endpoint that acts as the connector, which we’ll register with Cohere later. 

Let’s update the search endpoint. 

First, it contains a sample data array that shows how the retrieval results should be formatted. Let’s remove that.

Next, we add the code for retrieving the documents processed by the process endpoint.



def search(body):
    logger.debug(f'Search request: {body["query"]}')

        docs = demo_store["docs"]
        data = docs.retrieve(body["query"])
    except KeyError:
        return {"error": "No documents processed yet"}, 404

    return {"results": data}, 200, {"X-Connector-Id": app.config.get("APP_ID")}

Create API Specification

Next, we create an API specification file, which allows the application to set up the API routes and validate requests and responses as per the OpenAPI specification.

For this, go back to the quickstart connectors repository and copy the .openapi directory to your working directory. It contains api.yaml , which is an OpenAPI specification file that describes the API endpoints, request bodies, responses, and security schemes.

It already contains the specification for the search endpoint. Now we need to add the specification for the process endpoint.

# api.yaml

openapi: 3.0.3
  title: Search Connector API
  version: 0.0.1



Add Authentication

Implementing authentication and authorization methods to protect your web services is crucial. It ensures that only legitimate users can access your service, protects data from being accessed or modified by unauthorized individuals, and prevents service abuse.

Cohere supports a few authentication/authorization options: Service-to-service authentication, OAuth 2.0, and Pass-through. Find more details about how to implement these in the documentation.

The connector template already comes with the implementation of service-to-service authentication (in The remaining thing we need to do is to create a unique connector API key. We can generate this using libraries like secrets.

import secrets

In service-to-service authentication, Cohere sends requests to our connector with this connector API key. Later, we’ll register this key with Cohere, but for now, we’ll create the key and store it as an environment variable. For this, locate the .env-template file and rename it to .env. Then, rename the TEMPLATE_CONNECTOR_API_KEY to  DEMO_CONNECTOR_API_KEY (because we used the name “demo” earlier) and add the API key we created earlier.

# .env

Finally, when registering the API as a connector, we include this connector API key. As a result, Cohere will pass the key in every search request to the connector. We’ll see how to register a connector shortly.

Deploy API

Next, we deploy the API as a web service that can be accessed via the Internet.

There are many options available, including cloud service providers, Platform as a Service (PaaS) providers, and more. We’ll not cover deploying the API in this article, but if you are looking for an example, refer to this Render template f deploying a Poetry/Flask application or a guide on Docker deployment.

To make the API compatible as a connector, ensure the following:

  • Expose an endpoint that will return the retrieval results, which in our case is the search endpoint.
  • Make the endpoint URL accessible as a web service.
  • This endpoint must return a list of dictionaries called results.
  • Each dictionary item can contain any number of fields, with the minimum being the text field.

Recall that we deployed two endpoints: search and process. The process endpoint is not going to be part of the connector. We are using it solely for the purposes of processing documents, not retrieving them.

Register Connector

Next, we register the API that we deployed as a connector with Cohere.

Registering the connector with Cohere
Registering the connector with Cohere

We do that by sending a POST request to the API. We’ll need to provide the following information:

  • The Cohere API key
  • The name we want to call this connector
  • The URL of the API’s search endpoint
  • The connector API key

Here’s an example request:

curl --request POST \
  --url '' \
  --header 'Authorization: Bearer {Cohere API key}' \
  --header 'Content-Type: application/json' \
  --data '{
    "service_auth": {
       "type": "bearer",
       "token": "{Connector API Key}"

And here’s an example response (note the id field, which we’ll need to use later):

        "connector": {
        "id": "demo-conn-e5y5ps",
        "organization_id": "org-id",
        "name": "demo-conn",
        "url": "<>",
        "created_at": "2023-11-21T13:31:17.215587431Z",
        "updated_at": "2023-11-21T13:31:17.215587591Z",
        "auth_type": "service_auth",
        "active": True,
        "continue_on_failure": False

And with that, we have successfully registered the API as a connector.

Use the Connector

Now that the connector is registered and live, we turn to the other two components that we built in the previous chapter: chatbot and app.

Using the connector
Using the connector

First, we import the necessary libraries and set up the Cohere client.

import cohere
import os
import uuid

co = cohere.Client(COHERE_API_KEY)

Modify Chatbot and App Components

In connector mode, the implementation of a chatbot becomes much simpler. As mentioned earlier, the endpoint handles the tasks of deciding whether to retrieve information, generating queries, retrieving documents, chunking and reranking documents (post-retrieval), and generating the response.

The key modification is changing the Chat endpoint to use connectors instead. For this, we will pass the list of connectors during instance creation. Then, we use the connectors parameter in the Chat endpoint call. The syntax for using a connector is {"id": "connector-name}.

The chatbot component in connector mode is now simplified to just this:

class Chatbot:
    def __init__(self, connectors: List[str]):
        self.conversation_id = str(uuid.uuid4())
        self.connectors = [{"id": c} for c in connectors]   

    def generate_response(self, message: str):
        response =

        for event in response:
                yield event

Next, we reuse the App class from the previous chapter. This class is responsible for the conversational interface with the user, from getting the user message to displaying the response with citations.

class App:
    def __init__(self, chatbot: Chatbot):
        self.chatbot = chatbot
    def run(self):
        while True:
            # Get the user message
            message = input("User: ")

            # Typing "quit" ends the conversation
            if message.lower() == "quit":
                print("Ending chat.")
                print(f"User: {message}")

            # Get the chatbot response
            response = self.chatbot.generate_response(message)

            # Print the chatbot response            
            citations_flag = False
            for event in response:
                stream_type = type(event).__name__
                # Text
                if stream_type == "StreamTextGeneration":
                    print(event.text, end="")

                # Citations
                if stream_type == "StreamCitationGeneration":
                    if not citations_flag:
                        citations_flag = True
                # Documents
                if citations_flag:
                    if stream_type == "StreamingChat":
                        documents = [{'id': doc['id'],
                                    'text': doc['text'][:50] + '...',
                                    'title': doc['title'],
                                    'url': doc['url']} 
                                    for doc in event.documents]
                        for doc in documents:


Define and Process Documents

Before we can use the chatbot, we need to process the documents. We’ll use the same set of documents as before, which are chapters from a module from Cohere’s LLM University. The difference is that this time we use the process endpoint of our API to process the documents.

sources = [
        "title": "Text Embeddings", 
        "url": ""},
        "title": "Similarity Between Words and Sentences", 
        "url": ""},
        "title": "The Attention Mechanism", 
        "url": ""},
        "title": "Transformer Models", 
        "url": ""}   

DEMO_CONNECTOR_API_KEY = "YOUR_DEMO_CONNECTOR_API_KEY" # Replace with your connector API key

headers = {
    "Authorization": f"Bearer {DEMO_CONNECTOR_API_KEY}"

response ="", # Replace with your API's URL
                         json={"sources": sources},


Run Chatbot

We can now run the chatbot app! We supply the connector ID during the chatbot instance creation, and then create the app instance based on this.

# Define connectors
connectors = ["demo-conn-e5y5ps"]

# Create an instance of the Chatbot class by supplying the connectors
chatbot = Chatbot(connectors)

# Create an instance of the App class with the Chatbot instance
app = App(chatbot)

# Run the chatbot

Here’s an example conversation:

User: What are sentence embeddings
Sentence embeddings are the building blocks of language models. They associate each sentence with a vector (list of numbers) in a way that similar sentences are assigned similar vectors. These vectors are composed of numbers and carry important properties of the sentence. The embeddings act as a form of translation between languages as well, as they provide a relatable vector for similar sentences in different languages.

{'start': 69, 'end': 124, 'text': 'associate each sentence with a vector (list of numbers)', 'document_ids': ['demo-conn-e5y5ps_0', 'demo-conn-e5y5ps_1', 'demo-conn-e5y5ps_2']}
{'start': 139, 'end': 186, 'text': 'similar sentences are assigned similar vectors.', 'document_ids': ['demo-conn-e5y5ps_0', 'demo-conn-e5y5ps_1']}
{'start': 235, 'end': 272, 'text': 'important properties of the sentence.', 'document_ids': ['demo-conn-e5y5ps_1', 'demo-conn-e5y5ps_2']}

{'id': 'demo-conn-e5y5ps_0', 'text': 'In the previous chapter, we learned that sentence ...', 'title': 'Similarity Between Words and Sentences', 'url': ''}
{'id': 'demo-conn-e5y5ps_1', 'text': 'This is where sentence embeddings come into play. ...', 'title': 'Text Embeddings', 'url': ''}
{'id': 'demo-conn-e5y5ps_2', 'text': 'Sentence embeddings are even more powerful, as the...', 'title': 'Similarity Between Words and Sentences', 'url': ''}


User: How is it different from word embeddings
The primary distinction between word embeddings and sentence embeddings is that the latter assigns a vector to every sentence whereas the former does the same thing but for individual words. 

Both embeddings are similar in the sense that they associate vectors in a way that similar items (words or sentences) are mapped to similar vectors. Word embeddings are a subset of sentence embeddings.

{'start': 91, 'end': 125, 'text': 'assigns a vector to every sentence', 'document_ids': ['demo-conn-e5y5ps_0', 'demo-conn-e5y5ps_1']}
{'start': 165, 'end': 190, 'text': 'but for individual words.', 'document_ids': ['demo-conn-e5y5ps_0']}
{'start': 244, 'end': 261, 'text': 'associate vectors', 'document_ids': ['demo-conn-e5y5ps_0', 'demo-conn-e5y5ps_1']}
{'start': 315, 'end': 341, 'text': 'mapped to similar vectors.', 'document_ids': ['demo-conn-e5y5ps_0', 'demo-conn-e5y5ps_1']}
{'start': 342, 'end': 394, 'text': 'Word embeddings are a subset of sentence embeddings.', 'document_ids': ['demo-conn-e5y5ps_1']}

{'id': 'demo-conn-e5y5ps_0', 'text': 'In the previous chapters, you learned about word a...', 'title': 'The Attention Mechanism', 'url': ''}
{'id': 'demo-conn-e5y5ps_1', 'text': 'This is where sentence embeddings come into play. ...', 'title': 'Text Embeddings', 'url': ''}


Ending chat.


In this chapter, you learned how to build a RAG-powered chatbot with the Chat endpoint in connector mode. Connectors make it simpler to build RAG systems by abstracting away some of the implementation steps. They also make it easy to connect to multiple datastores at scale, which is typically what is required in an enterprise setting.

In the next chapter, we’ll continue our exploration of connectors by learning about quickstart connectors – a list of over 80 ready connectors to connect to various data sources. We’ll also learn about managed connectors, and in particular, the web search connector.

Get started by creating a Cohere account now.

About Cohere’s LLM University

Our comprehensive NLP curriculum aims to equip you with the skills to develop your own AI applications. We cater to learners from all backgrounds, covering everything from the basics to the most advanced topics in large language models (LLMs). Plus, you'll have the opportunity to work on hands-on exercises, allowing you to build and deploy your very own solutions. Take a course today. 

This LLMU course consists of the following chapters:

  1. Foundations of Chat and RAG
  2. Using Cohere Chat
  3. Using Cohere Chat with RAG in document mode
  4. Using Cohere Chat with RAG in connector mode
  5. Using quickstart connectors
  6. Fine-tuning models for Cohere Chat (coming soon)
Keep reading