Context by Cohere
How to Build RAG Applications With Quickstart Connectors

How to Build RAG Applications With Quickstart Connectors

Part 4 of the LLM University module on Retrieval-Augmented Generation.


In this chapter, you’ll learn how to connect RAG applications to datastores by leveraging Cohere’s pre-built quickstart connectors.

We’ll use Cohere’s Python SDK for the code examples. Follow along in this notebook. Note: To run the notebook, you must first deploy your own Google Drive connector as a web-based REST API (we’ll cover the steps in this chapter – here is the implementation code).


In the previous chapter, you learned about the concept of connectors and how to build a RAG-powered chatbot using connectors. In particular, we used the web search connector, a Cohere-managed connector.

In this chapter, you’ll learn about how to build your own connector using one of 80+ pre-built quickstart connectors. We’ll use it to connect a chatbot to a Google Drive, enabling the chatbot to use the Google Drive API to find answers to a user’s question by searching documents in the Google Drive.

What Are Quickstart Connectors?

Cohere’s build-your-own connectors framework enables developers to build a connector to any datastore that offers an accompanying search API. However, connecting to multiple data sources still requires effort to build the connectors. Developers will need to learn about the API of each platform, whether it is Google Drive, Slack, Notion, or GitHub, before they can build the connector. As the number of data sources increases, this becomes a bottleneck.

Quickstart connectors solve this problem. Developers can use pre-built implementations of over 80 connectors immediately without having to build them themselves. They can either use these quickstart connectors directly or adapt them to their organization’s needs.

Cohere’s quickstart connectors are open-sourced and available in our connectors repository. Here are a few examples:

  • Vector databases: Weaviate, Qdrant, Pinecone, Milvus
  • Full-text search engines: Elastic, Opensearch, Vespa, Solr
  • Collaboration: Slack, Linear, Asana, Jira, Trello, Miro
  • File management: Google Drive, Dropbox, Box
  • Data warehouse: Snowflake, Amazon Redshift
  • Content management: Readme, Wordpress, Medium
  • and many more

Step-by-Step Guide

To illustrate how quickstart connectors work, let’s build an example RAG-powered chatbot and connect the Cohere Chat endpoint to Google Drive.

An overview of what we'll build
An overview of what we'll build

Here are some additional references:

First, let’s install and import the cohere library, and then create a Cohere client using an API key.


First, let’s install and import the cohere library, and then create a Cohere client using an API key.

pip install cohere
import cohere
co = cohere.Client("COHERE_API_KEY")

Build and Deploy the Connector

We’ll need to first build a Google Drive connector and deploy it as a web-based REST API.

Enable Google Drive Access

First, we need to enable access to Google Drive. The Google Drive connector supports two types of authentication: Service Account and OAuth. In this example, we’ll use the Service Account option.

The steps are as follows:

  1. Create a project in Google Cloud Console.
  2. Create a service account and activate the Google Drive API in the Google Cloud Console.
  3. Create a service account key and download the credentials file as JSON. We’ll need to use this later. The credentials file should look like this:
  "type": "service_account",
  "project_id": "{project id}",
  "private_key_id": "{private_key_id}",
  "private_key": "{private_key}",
  "client_email": "{client_email}",
  "client_id": "{client_id}",
  "auth_uri": "{auth_uri}",
  "token_uri": "{token_uri}",
  "auth_provider_x509_cert_url": "{auth_provider_x509_cert_url}",
  "client_x509_cert_url": "{client_x509_cert_url}",
  "universe_domain": "{universe_domain}"

Once you are done with these steps, go to any Google Drive account and share the folder(s) you want to search with the service account email address. You can find the email address in Google Cloud Console > Service Accounts > Your project. As an example, we'll use the contents from LLM University: Prompt Engineering which explains the techniques of prompting LLMs. It consists of five web pages, which we’ll download and store as five documents in Google Drive.

Configure the Connector

Next, we clone the quickstart connector repository and choose the Google Drive connector from the list of options. Here’s a quick look at what’s inside the Google Drive connector:

  • app: This module handles the API endpoints and authentication. It defines the search function, which is the endpoint for the search API. It also handles the extraction of the access token from the request headers.
  • async_download: This module is responsible for downloading files from Google Drive asynchronously.
  • provider: This module interacts with the Google Drive API. It defines the search function, which performs a search on Google Drive using the provided query and access token. It also processes the search results, extracting the necessary information and downloading the file contents using the async_download module.
  • api.yaml file: This OpenAPI specification describes the API endpoints, request bodies, responses, and security schemes.

We then create an .env file to define the environment variables (use the .env-template file provided).

Finally, we define the GDRIVE_SERVICE_ACCOUNT_INFO. For this, we convert the service account key we created earlier to a string through json.dumps(credentials) and use this string as the environment variable.

Add Authentication

Our next step involves adding another environment variable, GDRIVE_CONNECTOR_API_KEY. This is required for connector authentication purposes.

Implementing authentication and authorization methods to protect your web services is crucial. It ensures that only legitimate users can access your service, protects data from being accessed or modified by unauthorized individuals, and prevents service abuse.

Cohere supports a few authentication/authorization options: service-to-service authentication, OAuth 2.0, and pass-through. Find more details about how to implement these in the documentation.

In service-to-service authentication, Cohere sends requests to our connector with this connector API key. You can generate an API key using libraries like secrets.

import secrets

Finally, when registering the API as a connector, we include this connector API key. As a result, Cohere will pass the key in every search request to the connector. We’ll see how to register a connector shortly.

Deploy the Connector

Next, we deploy the connector API as a web service that can be accessed via the Internet. There are many options available, including cloud service providers, platform-as-a-service (PaaS) providers, and more. We won’t cover deploying the API in this article, but if you are looking for an example, refer to this Render template for deploying a Poetry/Flask application or this guide on Docker deployment.

To make the API compatible as a connector, ensure the following:

  • Expose an endpoint that will return the retrieval results, which in our case is the search endpoint
  • This endpoint must return a list of dictionaries called results
  • Each dictionary item can contain any number of fields, with the minimum being the text field

With the quickstart connectors, these have already been implemented.

Test the Connector

Now, we can test if the API is working by making a curl request to the search endpoint.

curl --request POST \
		--url \
		--header 'Authorization: Bearer YOUR_CONNECTOR_API_KEY' \
		--header 'Content-Type: application/json' \
		--data '{"query": "word embeddings"}'

A successful response will return data containing some metadata followed by the fields that a connector is configured to provide (in this case, text, title, and url).

  "results": [
      "editedBy": "Meor Amer",
      "id": "10x9mJOnEr62hg1IFxgAtD1aIFS4NXJ2l5Lt-UhJXLVg",
      "mimeType": "application/",
      "modifiedTime": "2023-12-01T07:49:27.196Z",
      "text": "\ufeffEvaluating Outputs\r\nIn this chapter, you'll learn about the different techniques for evaluating LLM outputs.Introduction\r\nLarge language models (LLMs) offer exciting new ways to build applications that leverage natural language as the interface ...",
      "title": "Evaluating Outputs",
      "url": ""

      "editedBy": "Meor Amer",
      "id": "1wngAfCJY1IgD6H__4AkQXFfymKUpSeJL13TItbigdyA",
      "mimeType": "application/",
      "modifiedTime": "2023-12-01T07:50:02.204Z",
   "text": "\ufeffValidating Outputs\r\nIn this chapter ...",
   "title": "Validating Outputs",
      "url": ""

Register the Connector

Next, we register the Google Drive connector as a connector with Cohere. We do this by sending a POST request to the Cohere API. We’ll need to provide the following information:

  • The Cohere API key
  • The name we want to call this connector
  • The URL of the connector API’s search endpoint
  • The connector API key

Here’s an example request:

curl --request POST \\\\
  --url '<>' \\\\
  --header 'Authorization: Bearer {Cohere API key}' \\\\
  --header 'Content-Type: application/json' \\\\
  --data '{
    "service_auth": {
       "type": "bearer",
       "token": "{Connector API Key}"

And with that, we have successfully registered the API as a connector.

Use the Connector

Our new connector is now ready to use. To create a chatbot, we can reuse the same exact code from the previous chapter.

class Chatbot:
    def __init__(self, connectors: List[str]):
        Initializes an instance of the Chatbot class.

        self.conversation_id = str(uuid.uuid4())
        self.connectors = [ChatConnector(id=connector) for connector in connectors]

    def run(self):
        Runs the chatbot application.

        while True:
            # Get the user message
            message = input("User: ")

            # Typing "quit" ends the conversation
            if message.lower() == "quit":
                print("Ending chat.")
            # else:                         # Uncomment for Google Colab to avoid printing the same thing twice
            #     print(f"User: {message}") # Uncomment for Google Colab to avoid printing the same thing twice

            # Generate response
            response = co.chat_stream(

            # Print the chatbot response, citations, and documents
            citations = []
            cited_documents = []

            # Display response
            for event in response:
                if event.event_type == "text-generation":
                    print(event.text, end="")
                elif event.event_type == "citation-generation":
                elif event.event_type == "search-results":
                    cited_documents = event.documents

            # Display citations and source documents
            if citations:
              for citation in citations:

              for document in cited_documents:
                print({'id': document['id'],
                        'text': document['text'][:50] + '...'})


And when running the chatbot, we define the connector we have created.

# Define the connector
connectors = ["demo-conn-gdrive-6bfrp6"]

# Create an instance of the Chatbot class
chatbot = Chatbot(connectors)

# Run the chatbot

Here’s an example conversation:

User: What is prompt engineering

Prompt engineering pertains to the practice of constructing prompts to elicit desired responses from large language models (LLMs). Prompts can be constructed in various ways, such as adding specific details, providing instructions, or incorporating output format requirements. Different types of prompts are suited to different use cases. For instance, sequential prompting is common, especially when a task involves multiple subtasks.

Various techniques can be applied when constructing prompts. One example is prompt chaining, which involves running several prompts in a sequence or parallel to accomplish a goal. Additionally, the structure of prompts can be engineered to comply with specific requirements, ensuring that LLM outputs are safe, ethical and privacy-preserving.

Prompt engineering also covers the evaluation of LLM outputs. Evaluations are essential to ensure the quality and accuracy of the outputs, which can be probabilistic and vary for the same prompt. Techniques for evaluating LLMs include real user feedback, human evaluation, LLM-generated evaluation and word-level metrics.

start=47 end=67 text='constructing prompts' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_4']
start=71 end=95 text='elicit desired responses' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_4']
start=101 end=122 text='large language models' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_3', 'demo-conn-gdrive-6bfrp6_4']
start=123 end=129 text='(LLMs)' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_3', 'demo-conn-gdrive-6bfrp6_4']
start=190 end=206 text='specific details' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_2']
start=208 end=230 text='providing instructions' document_ids=['demo-conn-gdrive-6bfrp6_0']
start=249 end=276 text='output format requirements.' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_2']
start=277 end=296 text='Different use cases' document_ids=['demo-conn-gdrive-6bfrp6_2']
start=317 end=344 text='different types of prompts.' document_ids=['demo-conn-gdrive-6bfrp6_2']
start=359 end=379 text='sequential prompting' document_ids=['demo-conn-gdrive-6bfrp6_1']
start=423 end=441 text='multiple subtasks.' document_ids=['demo-conn-gdrive-6bfrp6_1']
start=451 end=461 text='techniques' document_ids=['demo-conn-gdrive-6bfrp6_0', 'demo-conn-gdrive-6bfrp6_1', 'demo-conn-gdrive-6bfrp6_2', 'demo-conn-gdrive-6bfrp6_3', 'demo-conn-gdrive-6bfrp6_4']
start=519 end=534 text='prompt chaining' document_ids=['demo-conn-gdrive-6bfrp6_1']
start=580 end=588 text='sequence' document_ids=['demo-conn-gdrive-6bfrp6_1']
start=592 end=600 text='parallel' document_ids=['demo-conn-gdrive-6bfrp6_1']
start=604 end=622 text='accomplish a goal.' document_ids=['demo-conn-gdrive-6bfrp6_1']
start=683 end=716 text='comply with specific requirements' document_ids=['demo-conn-gdrive-6bfrp6_4']
start=748 end=785 text='safe, ethical and privacy-preserving.' document_ids=['demo-conn-gdrive-6bfrp6_4']
start=822 end=832 text='evaluation' document_ids=['demo-conn-gdrive-6bfrp6_3']
start=878 end=896 text='ensure the quality' document_ids=['demo-conn-gdrive-6bfrp6_3']
start=901 end=909 text='accuracy' document_ids=['demo-conn-gdrive-6bfrp6_0']
start=939 end=952 text='probabilistic' document_ids=['demo-conn-gdrive-6bfrp6_3', 'demo-conn-gdrive-6bfrp6_4']
start=957 end=982 text='vary for the same prompt.' document_ids=['demo-conn-gdrive-6bfrp6_3']
start=1022 end=1040 text='real user feedback' document_ids=['demo-conn-gdrive-6bfrp6_3']
start=1042 end=1058 text='human evaluation' document_ids=['demo-conn-gdrive-6bfrp6_3']
start=1060 end=1084 text='LLM-generated evaluation' document_ids=['demo-conn-gdrive-6bfrp6_3']
start=1089 end=1108 text='word-level metrics.' document_ids=['demo-conn-gdrive-6bfrp6_3']

{'id': 'demo-conn-gdrive-6bfrp6_0', 'text': "\ufeffConstructing Prompts\r\nIn this chapter, you'll lea..."}
{'id': 'demo-conn-gdrive-6bfrp6_1', 'text': "\ufeffChaining Prompts\r\nIn this chapter, you'll learn a..."}
{'id': 'demo-conn-gdrive-6bfrp6_2', 'text': "\ufeffUse Case Patterns\r\nIn this chapter, you'll learn ..."}
{'id': 'demo-conn-gdrive-6bfrp6_3', 'text': "\ufeffEvaluating Outputs\r\nIn this chapter, you'll learn..."}
{'id': 'demo-conn-gdrive-6bfrp6_4', 'text': "\ufeffValidating Outputs\r\nIn this chapter, you'll learn..."}


User: quit
Ending chat.


In this chapter, you learned how to build your own connector for Google Drive, one of 80+ pre-built quickstart connectors available.

Continue to the next chapter to learn how to build RAG applications over multiple datastores and long documents.

About Cohere’s LLM University

Our comprehensive curriculum aims to equip you with the skills to develop your own AI applications. We cater to learners from all backgrounds, covering everything from the basics to the most advanced topics in large language models (LLMs). Plus, you'll have the opportunity to work on hands-on exercises, allowing you to build and deploy your very own solutions. Take a course today.

This LLMU module consists of the following chapters:

  1. Introduction to RAG
  2. RAG with Chat, Embed, and Rerank
  3. RAG with Connectors
  4. RAG with Quickstart Connectors (this chapter)
  5. RAG over Large-Scale Data
Keep reading