Context by Cohere
How to Use Quickstart Connectors

How to Use Quickstart Connectors

Part 5 of the LLM University module on Chat with Retrieval-Augmented Generation.


In the previous chapter of the Chat with Retrieval-Augmented Generation (RAG) module, we created a connector to use with the Chat endpoint. Connectors simplify building production-ready RAG applications by making it easy to connect to multiple datastores, and to do it at scale. The build-your-own connectors framework enables developers to develop a connector to any datastore that offers an accompanying search API.

However, connecting to multiple data sources still requires effort to build the connectors. Developers will need to learn about the API of each platform, whether it is Google Drive, Slack, Notion, or GitHub, before they can build the connector. As the number of data sources increases, this becomes a bottleneck. 

Quickstart connectors solve this problem. In this chapter, you’ll learn how to use quickstart connectors.


Quickstart Connectors

What Are Quickstart Connectors?

Connectors are independent REST APIs that can be used in a RAG workflow to provide secure, real-time access to private data. They receive a natural language search query and respond with a set of documents.

Quickstart connectors are pre-built implementations of over 80 connectors. Developers can use them immediately without having to build them themselves. They can either use these quickstart connectors directly or adapt them to their organization’s needs.

List of Quickstart Connectors

The quickstart connectors are open-sourced and are available in the connectors repository. Here are a few examples:

  • Vector databases: Weaviate, Qdrant, Pinecone, Milvus
  • Full-text search engines: Elastic, Opensearch, Vespa, Solr
  • Collaboration: Slack, Linear, Asana, Jira, Trello, Miro
  • File management: Google Drive, Dropbox, Box
  • Data warehouse: Snowflake, Amazon Redshift
  • Content management: Readme, Wordpress, Medium
  • and many more

Step-by-Step Guide

In this section, we’ll use Google Drive as an example quickstart connector to use with the Chat endpoint. 

We’ll use Google Drive as an example quickstart connector
We’ll use Google Drive as an example quickstart connector

We’ll use Cohere’s Python SDK for the code examples. Here are some additional references:

As a prerequisite, we highly recommend reading the previous chapter to get the context of what we have implemented so far in this module, which becomes the basis of what we’ll create in this chapter.

Enable Google Drive Access

First, we need to give our API access to Google Drive. The Google Drive connector supports two types of authentication: Service Account and OAuth. In this example, we’ll use the Service Account option.

The steps are as follows:

  1. Create a project in Google Cloud Console.
  2. Create a service account and activate the Google Drive API in the Google Cloud Console.
  3. Create a service account key and download the credentials file as JSON. We’ll need to use this later. The credentials file should look like this:
  "type": "service_account",
  "project_id": "{project id}",
  "private_key_id": "{private_key_id}",
  "private_key": "{private_key}",
  "client_email": "{client_email}",
  "client_id": "{client_id}",
  "auth_uri": "{auth_uri}",
  "token_uri": "{token_uri}",
  "auth_provider_x509_cert_url": "{auth_provider_x509_cert_url}",
  "client_x509_cert_url": "{client_x509_cert_url}",
  "universe_domain": "{universe_domain}"

  1. Share the folder(s) you want to search with the service account email address. You can find the email address in the Google Cloud Console > Service Accounts > Your project. As an example, we'll use the contents from a module from Cohere’s LLM University: Prompt Engineering.

Configure API

First, clone the quickstart connector repository and choose the Google Drive connector from the list of options. Here’s a quick look at what’s inside the Google Drive connector:

  • app: This module handles the API endpoints and authentication. It defines the search function, which is the endpoint for the search API. It also handles the extraction of the access token from the request headers.
  • async_download: This module is responsible for downloading files from Google Drive asynchronously.
  • provider: This module interacts with the Google Drive API. It defines the search function which performs a search on Google Drive using the provided query and access token. It also processes the search results, extracting the necessary information and downloading the file contents using the async_download module.
  • api.yaml file: an OpenAPI specification that describes the API endpoints, request bodies, responses, and security schemes.

Next, we’ll create an .env file to define the environment variables (use the .env-template file provided). We need to define two environment variables:

  • GDRIVE_SERVICE_ACCOUNT_INFO: Convert the service account key we created earlier to a string through json.dumps(credentials) and use this string as the environment variable.
  • GDRIVE_CONNECTOR_API_KEY: This is required for connector authentication purposes. Refer to the previous chapter on how to create a connector API key, such as using the secrets library.

Deploy API

Next, we deploy the API with our platform of choice and make the search endpoint of the API accessible as a web service. We covered API deployment in the previous chapter.

Register Connector

Next, we register the API as a connector with Cohere. This follows the same steps as the previous chapter. An example POST request is as follows.

curl --request POST \
  --url '' \
  --header 'Authorization: Bearer {Cohere API key}' \
  --header 'Content-Type: application/json' \
  --data '{
    "service_auth": {
       "type": "bearer",
       "token": "{Connector API Key}"

The response will provide us with the connector ID in the id field, which we’ll need for the next step.

Use Connector

By now, you will notice that the heavy lifting of creating a connector is already taken care of by the quickstart code. We didn’t have to implement the endpoint, authorization step, and Google Drive API call, among others.

Using the connector is also the same as in the previous chapter. The only change is replacing the connector ID with the one we just created.

class Chatbot:
    def __init__(self, connectors: List[str]):
        self.conversation_id = str(uuid.uuid4())
        self.connectors = [{"id": c} for c in connectors]

    def generate_response(self, message: str):
        response =

        for event in response:
            yield event
        yield response
class App:

connectors = ["demo-conn-gdrive-1x2p4k"]
chatbot = Chatbot(connectors)
app = App(chatbot)

Here’s an example conversation:

User: What is prompt engineering
Prompt engineering is a technique used on language models, like Command models, where an instruction is given to generate a response. A prompt can be as simple as a single line of instruction, but the more specific the prompt, the more accurate the response tends to be. Each additional component incorporated into a prompt provides a different means to enhance the quality of the response.

[{'start': 42, 'end': 57, 'text': 'language models', 'document_ids': ['demo-conn-gdrive-1x2p4k_3:0']}]
[{'start': 64, 'end': 78, 'text': 'Command models', 'document_ids': ['demo-conn-gdrive-1x2p4k_0:3']}]
[{'start': 89, 'end': 133, 'text': 'instruction is given to generate a response.', 'document_ids': ['demo-conn-gdrive-1x2p4k_0:3', 'demo-conn-gdrive-1x2p4k_0:33']}]
[{'start': 165, 'end': 191, 'text': 'single line of instruction', 'document_ids': ['demo-conn-gdrive-1x2p4k_0:3', 'demo-conn-gdrive-1x2p4k_0:33']}]
[{'start': 201, 'end': 257, 'text': 'more specific the prompt, the more accurate the response', 'document_ids': ['demo-conn-gdrive-1x2p4k_0:33']}]
[{'start': 335, 'end': 390, 'text': 'different means to enhance the quality of the response.', 'document_ids': ['demo-conn-gdrive-1x2p4k_0:33']}]

Chaining Prompts (demo-conn-gdrive-1x2p4k_1:15). URL:
Evaluating Outputs (demo-conn-gdrive-1x2p4k_4:15). URL:
Validating Outputs (demo-conn-gdrive-1x2p4k_3:0). URL:
Constructing Prompts (demo-conn-gdrive-1x2p4k_0:3). URL:
Constructing Prompts (demo-conn-gdrive-1x2p4k_0:33). URL:

Web Search Connector

Other than the quickstart connectors, there is a web search connector managed by Cohere available to use. Developers can use it immediately without any additional configuration or deployment.

We can use the web search connector by adding the web-search ID directly to the connector parameter. There is no deployment step required.

connectors = ["web-search"]
chatbot = Chatbot(connectors)
app = App(chatbot)

The web search connector will provide a list of documents from the web search results containing the following fields: title, url, and snippet, which the endpoint will use to generate a grounded response and cite its response.

Here’s an example conversation:

User: What is LLM University
LLM University (LLMU) is an online learning resource provided by Cohere which teaches natural language processing (NLP) using large language models. Its curriculum covers various topics from the basics of LLMs to advanced subjects like generative AI, with plenty of practical code examples to help solidify your knowledge. The courses are geared towards anyone interested in language processing, from beginners to enthusiasts looking to build apps using language AI.

[{'start': 15, 'end': 21, 'text': '(LLMU)', 'document_ids': ['web-search_1:0', 'web-search_0:4', 'web-search_1:2', 'web-search_1:3', 'web-search_0:3']}]
[{'start': 28, 'end': 71, 'text': 'online learning resource provided by Cohere', 'document_ids': ['web-search_1:0']}]
[{'start': 86, 'end': 119, 'text': 'natural language processing (NLP)', 'document_ids': ['web-search_1:0']}]
[{'start': 126, 'end': 148, 'text': 'large language models.', 'document_ids': ['web-search_1:0']}]
[{'start': 195, 'end': 249, 'text': 'basics of LLMs to advanced subjects like generative AI', 'document_ids': ['web-search_1:0', 'web-search_1:2']}]
[{'start': 266, 'end': 322, 'text': 'practical code examples to help solidify your knowledge.', 'document_ids': ['web-search_1:2']}]
[{'start': 339, 'end': 394, 'text': 'geared towards anyone interested in language processing', 'document_ids': ['web-search_1:2']}]
[{'start': 401, 'end': 410, 'text': 'beginners', 'document_ids': ['web-search_1:2']}]
[{'start': 414, 'end': 466, 'text': 'enthusiasts looking to build apps using language AI.', 'document_ids': ['web-search_1:2']}]

Introducing LLM University — Your Go-To Learning Resource for NLP🎓 (web-search_1:0). URL:
LLM University (LLMU) | Cohere (web-search_0:4). URL:
Introducing LLM University — Your Go-To Learning Resource for NLP🎓 (web-search_1:2). URL:
Introducing LLM University — Your Go-To Learning Resource for NLP🎓 (web-search_1:3). URL:
LLM University (LLMU) | Cohere (web-search_0:3). URL:

Using Multiple Connectors

In an enterprise setting, data is distributed across multiple platforms and datastores. The real value of using connectors comes from being able to use multiple connectors at the same time. This way, we are maximizing the RAG system’s potential as an intelligent knowledge assistant, giving it access to various data sources, so it can synthesize the information from all these data sources.

Let’s now look at an example of using two connectors: the Google Drive and web search connectors.

connectors = ["demo-conn-gdrive-1x2p4k", "web-search"]
chatbot = Chatbot(connectors)
app = App(chatbot)

The endpoint synthesizes information from all these data sources to generate the response. Here’s an example conversation:

User: What is chain of thought prompting
Chain of thought prompting is a technique used to help LLMs (Large Language Models) perform complex reasoning by breaking down problems into logical, bite-sized chunks. This method encourages LLMs to produce intermediate reasoning steps before delivering a final answer to a multi-step problem. The idea is that a model-generated chain of thought would mimic an intuitive thought process when working through a multi-step reasoning problem. 

This concept was introduced by Wei et al. in 2023, and has been found to be particularly useful in improving LLMs' performance at complex arithmetic, commonsense, and symbolic reasoning tasks.

[{'start': 55, 'end': 83, 'text': 'LLMs (Large Language Models)', 'document_ids': ['web-search_9:2']}]
[{'start': 92, 'end': 109, 'text': 'complex reasoning', 'document_ids': ['web-search_7:2', 'web-search_9:2', 'web-search_8:7', 'web-search_8:1']}]
[{'start': 113, 'end': 168, 'text': 'breaking down problems into logical, bite-sized chunks.', 'document_ids': ['web-search_7:2', 'web-search_9:2']}]
[{'start': 208, 'end': 269, 'text': 'intermediate reasoning steps before delivering a final answer', 'document_ids': ['web-search_3:2', 'demo-conn-gdrive-1x2p4k_11:20', 'web-search_7:2', 'demo-conn-gdrive-1x2p4k_10:6', 'web-search_8:7', 'web-search_8:1']}]
[{'start': 275, 'end': 294, 'text': 'multi-step problem.', 'document_ids': ['web-search_3:2', 'web-search_8:7']}]
[{'start': 314, 'end': 387, 'text': 'model-generated chain of thought would mimic an intuitive thought process', 'document_ids': ['web-search_3:2', 'web-search_8:7']}]
[{'start': 474, 'end': 484, 'text': 'Wei et al.', 'document_ids': ['demo-conn-gdrive-1x2p4k_11:20', 'web-search_9:2', 'demo-conn-gdrive-1x2p4k_10:6']}]
[{'start': 488, 'end': 492, 'text': '2023', 'document_ids': ['demo-conn-gdrive-1x2p4k_10:6']}]
[{'start': 573, 'end': 635, 'text': 'complex arithmetic, commonsense, and symbolic reasoning tasks.', 'document_ids': ['web-search_7:2', 'web-search_9:2']}]

Language Models Perform Reasoning via Chain of Thought – Google Research Blog (web-search_3:2). URL:
Constructing Prompts (demo-conn-gdrive-1x2p4k_11:20). URL:
Let’s Think Step by Step: Advanced Reasoning in Business with Chain-of-Thought Prompting | by Jerry Cuomo | Aug, 2023 | Medium (web-search_7:2). URL:
Constructing Prompts (demo-conn-gdrive-1x2p4k_11:30). URL:
Chain-of-Thought Prompting: Helping LLMs Learn by Example | Deepgram (web-search_9:2). URL:
Chaining Prompts (demo-conn-gdrive-1x2p4k_10:6). URL:
Master Prompting Concepts: Chain of Thought Prompting (web-search_8:7). URL:
Chaining Prompts (demo-conn-gdrive-1x2p4k_10:7). URL:
Master Prompting Concepts: Chain of Thought Prompting (web-search_8:1). URL:
Chaining Prompts (demo-conn-gdrive-1x2p4k_10:0). URL:

Automated Chunking and Reranking

With all these documents coming from various connectors, you may be asking a couple of questions:

  • How to handle long documents? Connecting to multiple connectors means having to deal with various APIs, each with its own way of providing documents. Some may return a complete document with tens or hundreds of pages. There are a couple of problems with this. First, stuffing a long document into an LLM prompt means its context limit will be reached, resulting in an error. Second, even if the context limit is not reached, the LLM response will likely not be very good because it is getting a lot of irrelevant information from a long document instead of specific chunks from the document that are the most relevant.
  • How to handle results from multiple connectors and queries? For a specific connector, the retrieval and reranking implementation is within the developer’s control. But with multiple connectors, that is not possible because these documents are aggregated at the Chat endpoint. As the number of connectors increases, this becomes a bigger problem because we don’t have control over the relevancy of the documents sent to the LLM prompt. And then there is the same problem of possible context limits being reached. Furthermore, if more than one query is generated, the number of documents retrieved will multiply with the same number.

The Chat endpoint solves these problems with its automated chunking and reranking process. Let’s see how it’s done.

Note that for this to happen, the prompt_truncation parameter should be set as AUTO (default) and not OFF.


With every document sent by the connectors, the first step is to split it into smaller chunks. Each chunk is between 100 and 400 words, and sentences are kept intact where possible.

Chunking the retrieved documents
Chunking the retrieved documents

Going back to the example responses, notice that some document IDs are shown as such: web-search_5:2. It contains not just the document ID (5 in this example) but also another number separated by a colon (2 in this example). This represents the chunk number of the document. If we concatenate web-search_2:0, web-search_2:1, web-search_2:2, and so on, we’ll get the original document.


The Chat endpoint then uses the Rerank endpoint to take all the chunked documents from all connectors and rerank them based on contextual relevance to the query.

Reranking the chunked documents
Reranking the chunked documents

This will be independent for each query and connector. For example, suppose a user asks the question, “What is AI and how can enterprises use it,” resulting in two queries generated by the endpoint: “What is AI” and “How can enterprises use AI”. Also, suppose there are two connectors: “web search” and “notion.”

This means that there will be four lists of chunked documents (two queries for two connectors), each to be reranked separately.

The reranking step takes the top 20 chunks from each list and drops the rest.


The reranked documents from the different lists are then interleaved into one list.

Interleaving the reranked chunks
Interleaving the reranked chunks

With our example above, suppose these are the four lists of reranked documents:

  • Web search results (“What is AI”): web_ai_1, web_ai_2, web_ai_3
  • Notion search results (“What is AI”): notion_ai_1, notion_ai_2, notion_ai_3
  • Web search results (“How can enterprises use AI”): web_enterprise_1, web_enterprise_2, web_enterprise_3
  • Notion search results (“How can enterprises use AI”): notion_enterprise_1, notion_enterprise_2, notion_enterprise_3

The documents will be interleaved in a list in this order:

  • Documents: web_ai_1, notion_ai_1, web_enterprise_1, notion_enterprise_1, web_ai_2, notion_ai_2, web_enterprise_2, notion_enterprise_2, web_ai_3, notion_ai_3, web_enterprise_3, notion_enterprise_3

This list is what gets sent to the LLM prompt.

Prompt Building

Recall that we enable the prompt_truncation parameter by setting it to AUTO. Prompt truncation means that some elements from chat_history and documents will be dropped in an attempt to construct a prompt that fits within the model's context length limit.

Documents and chat history will be iteratively added until the prompt is too long. And this prompt is what will be passed to the Command model for response generation.


In this chapter, you learned how to deploy and use a quickstart connector, using the Google Drive connector as an example. You also learned about the web search connector, which is Cohere’s managed connector that developers can use to connect to web search engine results. Finally, we looked at how to use multiple connectors for connecting to multiple datastores in a RAG application.

Get started by creating a Cohere account now.

About Cohere’s LLM University

Our comprehensive NLP curriculum aims to equip you with the skills to develop your own AI applications. We cater to learners from all backgrounds, covering everything from the basics to the most advanced topics in large language models (LLMs). Plus, you'll have the opportunity to work on hands-on exercises, allowing you to build and deploy your very own solutions. Take a course today. 

This LLMU course consists of the following chapters:

  1. Foundations of Chat and RAG
  2. Using Cohere Chat
  3. Using Cohere Chat with RAG in document mode
  4. Using Cohere Chat with RAG in connector mode
  5. Using quickstart connectors
  6. Fine-tuning models for Cohere Chat (coming soon)
Keep reading