Context by Cohere
How to Build RAG Applications With Connectors

How to Build RAG Applications With Connectors

Part 3 of the LLM University module on Retrieval-Augmented Generation.


In this chapter, you’ll learn about connectors and how to build RAG applications using the web search connector.

We’ll use Cohere’s Python SDK for the code examples. Follow along in this notebook.


In the previous chapter, we built the chatbot using the Chat endpoint’s document mode. Document mode provides developers with the flexibility to customize each component of a RAG stack.

There is another way to build RAG systems with the Chat endpoint, which is through the connector mode. Connector mode simplifies the development of RAG systems by abstracting away some of the complexities.

We’ll explore connectors over the next three chapters:

  • In this chapter (Chapter 3), we’ll discuss how to get started quickly with connectors using the web search connector. It’s a connector managed by Cohere, and because of that, we can focus on connector usage instead of implementation.
  • In Chapter 4, we’ll implement one of Cohere’s quickstart connectors, which are over 80 pre-built connectors that you can use to connect to popular enterprise datastores.
  • In Chapter 5, we’ll see how to use connectors at scale, specifically on multiple datastores and long documents.

What Are Connectors?

Connectors are independent REST APIs that can be used in a RAG workflow to provide secure, real-time access to private data.

In enterprises, data lives in many different places. The ability of enterprises to realize the full value of RAG rests on their ability to bring these data sources together. Cohere’s build-your-own connectors framework enables developers to develop a connector to any datastore that offers an accompanying search API.

Cohere’s connectors framework simplifies connecting RAG systems to datastores
Cohere’s connectors framework simplifies connecting RAG systems to datastores

At a high level, here’s what connectors do. When the Chat endpoint calls a connector, what happens is that the endpoint is sending a query to the search endpoint of that connector. The connector will then return the list of documents that it deems the most relevant to the query.

The build-your-own connectors framework allows developers to build any logic behind a connector. For example, you can define the retrieval implementation—whether it’s running a semantic similarity search over a vector database, searching over an existing full-text search engine, or utilizing the existing search APIs of platforms like Google Drive or Notion.

Additionally, in connector mode, most of the RAG building blocks are taken care of by the endpoint. This includes deciding whether to retrieve information, generating queries, retrieving documents, chunking and reranking documents (post-retrieval), and generating the response.

Recall that in the previous chapter (document mode), we implemented the following steps.

  • Step 1: Get the user message
  • Step 2: Call the Chat endpoint in query-generation mode
  • If at least one query is generated:
    • Step 3: Retrieve and rerank relevant documents
    • Step 4: Call the Chat endpoint in document mode to generate a grounded response with citations
  • If no query is generated:
    • Step 4: Call the Chat endpoint in normal mode to generate a direct response

In connector mode, this is simplified to the following two steps.

  • Step 1: Get the user message
  • Step 2: Call the Chat endpoint in connector mode to generate a response (this can be either a grounded response with citations or a direct response)

Step-by-Step Guide

Below is a diagram that provides an overview of what we’ll build. We’ll build a RAG chatbot that can search the web, retrieve relevant results to a user query, and generate grounded responses to the query.

An overview of what we'll build
An overview of what we'll build


First, let’s install and import the cohere library, and then create a Cohere client using an API key.

pip install cohere
import cohere
co = cohere.Client("COHERE_API_KEY")

Create the Chatbot Component

The change from document mode to connector mode requires just one change to the Chat endpoint, which is swapping the documents parameter with the connectors parameter.

Here’s how it looks with the web search connector. We supply the connector id, which is web-search as an argument to the connectors parameter.

response = co.chat_stream(message="What is LLM university",
     connectors = [ChatConnector(id="web-search)])

The one line of code above is enough to get a full RAG-enabled response—the response text, the citations, and the source documents, which in this case are snippets from the most relevant information available on the web based on a given user message.

But in order to run this in a multi-turn chatbot scenario, we need to build the chatbot component. The good news is that we can adapt the chatbot we built in the previous chapter.

There are a few changes to make, including:

  • Remove the query generation logic (done by the endpoint)
  • Remove the retrieval logic (done by the endpoint)
  • Change the Chatbot initialization to use connectors instead
  • Use the connectors parameter instead of documents in the Chat endpoint call
class Chatbot:
    def __init__(self, connectors: List[str]):
        Initializes an instance of the Chatbot class.

        self.conversation_id = str(uuid.uuid4())
        self.connectors = [ChatConnector(id=connector) for connector in connectors]

    def run(self):
        Runs the chatbot application.

        while True:
            # Get the user message
            message = input("User: ")

            # Typing "quit" ends the conversation
            if message.lower() == "quit":
                print("Ending chat.")
            # else:                         # Uncomment for Google Colab to avoid printing the same thing twice
            #     print(f"User: {message}") # Uncomment for Google Colab to avoid printing the same thing twice

            # Generate response
            response = co.chat_stream(

            # Print the chatbot response, citations, and documents
            citations = []
            cited_documents = []

            # Display response
            for event in response:
                if event.event_type == "text-generation":
                    print(event.text, end="")
                elif event.event_type == "citation-generation":
                elif event.event_type == "search-results":
                    cited_documents = event.documents

            # Display citations and source documents
            if citations:
              for citation in citations:

              for document in cited_documents:
                print({'id': document['id'],
                      'snippet': document['snippet'][:50] + '...',
                      'title': document['title'],
                      'url': document['url']})


Run the Chatbot

And that’s about it. We are now ready to run the chatbot.

First we define the connector to use, which is web-search. Next, we create an instance of the Chatbot class using the connector, and then we run the chatbot.

# Define the connector
connectors = ["web-search"]

# Create an instance of the Chatbot class
chatbot = Chatbot(connectors)

# Run the chatbot

And we get the same type of response as we’ve seen in the previous chapter – the text response followed by the citations and source documents used.

User: What is Cohere's LLM University

LLM University, offered by Cohere, is a set of comprehensive learning resources for anyone interested in Natural Language Processing (NLP), from beginners to advanced learners. The curriculum aims to provide a solid foundation in NLP and equips learners with the skills needed to develop their own AI applications.
The course covers various topics, including semantic search, generation, classification, embeddings, and other NLP techniques. Learners can explore these concepts through hands-on exercises and practical code examples.
Join the Discord community to connect with other learners and access the latest updates!

start=27 end=33 text='Cohere' document_ids=['web-search_0', 'web-search_1']
start=47 end=79 text='comprehensive learning resources' document_ids=['web-search_1']
start=105 end=132 text='Natural Language Processing' document_ids=['web-search_0', 'web-search_1']
start=133 end=138 text='(NLP)' document_ids=['web-search_0', 'web-search_1']
start=145 end=176 text='beginners to advanced learners.' document_ids=['web-search_0', 'web-search_1']
start=181 end=191 text='curriculum' document_ids=['web-search_0', 'web-search_1']
start=210 end=233 text='solid foundation in NLP' document_ids=['web-search_0', 'web-search_1']
start=263 end=314 text='skills needed to develop their own AI applications.' document_ids=['web-search_0', 'web-search_1']
start=359 end=374 text='semantic search' document_ids=['web-search_0', 'web-search_1']
start=376 end=386 text='generation' document_ids=['web-search_0', 'web-search_1']
start=388 end=402 text='classification' document_ids=['web-search_0', 'web-search_1']
start=404 end=414 text='embeddings' document_ids=['web-search_0', 'web-search_1']
start=420 end=441 text='other NLP techniques.' document_ids=['web-search_0', 'web-search_1']
start=486 end=504 text='hands-on exercises' document_ids=['web-search_0', 'web-search_1']
start=509 end=533 text='practical code examples.' document_ids=['web-search_0', 'web-search_1']
start=543 end=560 text='Discord community' document_ids=['web-search_0', 'web-search_1']

{'id': 'web-search_0', 'snippet': 'Guides and ConceptsAPI ReferenceRelease NotesAppli...', 'title': 'LLM University (LLMU) | Cohere', 'url': ''}
{'id': 'web-search_1', 'snippet': 'Introducing LLM University — Your Go-To Learning R...', 'title': 'Introducing LLM University — Your Go-To Learning Resource for NLP🎓', 'url': ''}
{'id': 'web-search_2', 'snippet': 'Skip to main content\n\nMadras High Court Reads Down...', 'title': 'LawBeat | Madras High Court Reads Down University Admission Rule Mandating 2-Yr LLM for PhD Admission', 'url': ''}
{'id': 'web-search_3', 'snippet': 'Take your legal expertise to the next level with a...', 'title': 'LLM Program', 'url': ''}
{'id': 'web-search_4', 'snippet': "The People's Network\n\nSign In with Facebook\n\nBy cl...", 'title': 'Revolutionizing AI: University of Michigan and Apple Team Up to Boost LLM Efficiency', 'url': ''}
{'id': 'web-search_5', 'snippet': 'Ministers urged to tackle “damaging” trial delays ...', 'title': 'LLM Master of Laws (General) Degree | University of Law', 'url': ''}
{'id': 'web-search_6', 'snippet': "The People's Network\n\nSign In with Facebook\n\nBy cl...", 'title': "Tsinghua University's Ouroboros Framework: Revolutionizing LLM Inference Speed by 2.8x", 'url': ''}
{'id': 'web-search_7', 'snippet': 'Skip to main content\n\nSupport the Law School\n\nCons...', 'title': 'LLM & Graduate Programs • Graduate Admissions • Penn Carey Law', 'url': ''}
{'id': 'web-search_8', 'snippet': 'Skip to navigation | Skip to main content | Skip t...', 'title': 'LLM Law (2024 entry) | The University of Manchester', 'url': ''}


User: quit
Ending chat.


In this chapter, you learned about the concept of connectors and how to build a RAG-powered chatbot using connectors. In particular, we used the web search connector, which is a Cohere-managed connector that you can use immediately.

Continue to the next chapter to learn how to connect RAG applications to datastores by leveraging Cohere’s pre-built quickstart connectors.

About Cohere’s LLM University

Our comprehensive curriculum aims to equip you with the skills to develop your own AI applications. We cater to learners from all backgrounds, covering everything from the basics to the most advanced topics in large language models (LLMs). Plus, you'll have the opportunity to work on hands-on exercises, allowing you to build and deploy your very own solutions. Take a course today.

This LLMU module consists of the following chapters:

  1. Introduction to RAG
  2. RAG with Chat, Embed, and Rerank
  3. RAG with Connectors (this chapter)
  4. RAG with Quickstart Connectors
  5. RAG over Large-Scale Data
Keep reading