Context by Cohere
Cohere Embed: New Features for More Efficient and Accurate Enterprise Search

Cohere Embed: New Features for More Efficient and Accurate Enterprise Search

Cohere Embed now supports compressed embeddings and the asynchronous compute of embeddings on Cohere's servers.


Cohere Embed is a key pillar of our enterprise search solution and powers our industry leading retrieval-augmented generation (RAG) capabilities. Embed integrates seamlessly with our industry leading Command R and Rerank models, offering exceptional performance for RAG applications in enterprise settings.

Embed is our leading text representation language model that allows you to create semantic text embeddings, mapping data to a high-dimensional vector space based on their semantic meaning. We continue to improve it and are excited to announce our newest features, ensuring that we deliver the most advanced RAG solutions available.

Our Embed v3 model now supports compressed embeddings to reduce query latency and reduce the vector database storage space needed for embedded data. With our new compression for embeddings, we support up to 32x compression in memory storage needs compared to our standard embeddings, while maintaining search quality. In addition, the further you compress, the faster your search will run as well.

We are also introducing an updated version of our Embed Jobs feature. We aim to enable enterprises with large datasets to get the most out of their document libraries. With the new endpoint, developers can upload a dataset and asynchronously compute the embeddings on Cohere’s servers. Embed Jobs reduces the number of errors compared to embedding through a live API, enabling a faster and more reliable embedding process.

Compressed Embeddings

When working with a search system, there can be a significant cost associated with storing embeddings as each value is stored in float32, the common data type, as seen in the table below. Compressed embeddings reduce storage costs without sacrificing the integrity of the data. Embed now supports the following types of compression factors: uint8, int8, ubinary and binary

In our evaluations, we compare the vector database cost and the MIRACL benchmark score of Embed v3 in different compression factors against OpenAI’s three available embedding models. We find that we outperform OpenAI across the board in quality, maintaining high performance even with compression, leading to significant expected savings in vector database storage.

The new compressed embeddings feature represents another step forward for Cohere in our efforts to help enterprises utilize large amounts of their data to its full potential. Compressed embeddings offer significant financial incentives with reduced storage costs, a crucial advantage for enterprises. Compression can reduce storage requirements by up to 96%, providing substantial cost savings and enhanced data accessibility. 

Learn more about the implementation process and how to utilize it here.

Embed Jobs

At its core, Embed Jobs is a helper function that increases the ease of encoding datasets and streamlines the embeddings creation process. You can now bulk upload your dataset and compute the embeddings on Cohere’s servers and store them back in your vector database.

The updated Embed Jobs endpoint offers asynchronous embedding generation, ensuring a seamless process. It also includes data staging and validation, eliminating the hassle of managing partial completion or errors. This means that once a job is launched, there is no need to manage partial completion from user errors or API downtime. 

Once you have computed the embeddings on Cohere’s servers you can download the embeddings and use them to enable semantic search for document classification, information retrieval, or fraud detection amongst other use cases.

Explore our documentation to learn how Embed Jobs simplifies embedding generation for your dataset.

Best-in-class for RAG

Embed works in concert with our Command R and Rerank models to provide best-in-class integration for RAG applications and excel at enterprise use cases. By improving the contextual understanding and relevance of retrieved passages, the Embed model enables RAG to generate more accurate and coherent responses, especially when dealing with extensive document repositories. We’re excited to be continually improving search and retrieval systems to make the best RAG applications.

Average accuracy of an end-to-end evaluation of the Natural Questions, TriviaQA, and HotpotQA (single-retrieval) benchmarks using a KILT Wikipedia index for all models. We evaluated both a leading open source embedding model (gte-large) and Cohere's search stack with Command-R. Accuracy is calculated using the presence of keyphrases in the model's answer.

Cohere Embed, unlocks powerful capabilities for generative models, significantly enhancing the performance and accuracy of retrieval-augmented generation (RAG) pipelines, thereby unlocking new possibilities for building responsive AI applications.

Developers can get started with Embed on our playground. Learn more about Embed v3, compressed embeddings and Embed Jobs in our developer documentation.

As a bonus, we have embedded all of Wikipedia for you to use, you can find it here.

Keep reading