Cohere For AI Turns One: A Year of Pioneering Machine Learning Research

It was only one year ago that we launched Cohere For AI (C4AI) – Cohere’s research lab dedicated to contributing fundamental research in machine learning. The vision for the lab was simple but ambitious: change how, where, and by whom research is done.

Led by Sara Hooker, a leading ML researcher in the field with a long track-record of impactful research, the C4AI team has had the pleasure of collaborating with over 100 extraordinary academics, scholars and researchers and 30 institutions and organizations. Over the last year, C4AI has launched 15 open science initiatives, supported the publication of more than 25 papers, welcomed an inaugural Scholars cohort, and established a global community of over 1,500 independent researchers across 107 countries.

Transforming Research: A Year of Collaboration and Exploration

Research Contributions: A Collaborative Approach with +100 Researchers and +30 Institutions

Over the last year, the C4AI team and Cohere technical staff have published more than 25 research papers and presented them at top ML conferences such as EMNLP, NeurIPS and ICLR. These papers focus on a range of ML problems such as metadata archeology, NLP efficiencies, compression of multilingual models, ensemble fairness, toxicity evaluation, AI safety and quantization at scale. This research is the result of collaborations with over 100 researchers across 30+ institutions and organizations, including the University of Washington, University of Toronto, University of Waterloo, University College London, Cambridge University, Meta AI Research and Google Research. Most of our collaborations are cross-institutional, and have involved key members of our research staff and technical colleagues across Cohere. You can find a full list of our publications here.

This year, we welcomed Beyza Ermis, Ahmet Üstün, and Marzieh Fadaee to the research staff. Showcasing the research impact of the C4AI team and Cohere’s technical staff, we presented our research and led keynotes at top ML conferences, including NAACL, ICML, Deep Learning Indaba, RIIAA, NeurIPS, EMNLP, Khipu, ECCV, and ICLR. We also co-organized workshops on "Broadening Research Collaborations" at NeurIPS and "The Pitfalls of Limited Data and Computation for Trustworthy ML"  at ICLR.

Our Fireside Chats also created important forums for discourse, where we convened experts in machine learning for technical talks and fireside chats. We hosted memorable conversations with Samy Bengio, Pablo Samuel Castro, Colin Raffel and Sasha Rush. We also hosted over 30 technical talks, bringing together experts in machine learning and independent researchers all over the world.

Pioneering Research: Our Inaugural Scholars Cohort

A key part of Cohere For AI’s commitment to solving complex machine learning problems is providing more entry points for rising stars to participate in ML research. Last year, we launched the Cohere For AI Scholars Program to help close the gap between research experience and opportunity. In our inaugural year, we welcomed 5 incredibly talented researchers – reflecting an acceptance rate of 2% of all applicants. We are a little over half way through the program and are already proud to share some of our scientific contributions: Luiza Pozzobon has published her first work on the challenges of using black-box APIs for toxicity evaluation and Arash Ahmadian has released a new pre-print that explores intriguing properties of quantization at scale.

Launching a Global Community of ML Researchers: An Open Science Initiative

We are committed to changing how research is done. Last year, we launched an open science community where individuals from various backgrounds, from lifelong learners to seasoned engineers, could connect, explore and collaborate. Since then, C4AI has supported AI research and has grown into a global community of over 1,500 independent researchers from 107 countries.

Our goal is to make science open and accessible to everyone. One year on, we have 11 community-led programs championed by 28 dedicated volunteer leads. Our open science initiative hosts over 20 events each month, from research collaborations to mentorships, reading groups, and inspiring lightning talks. We have supported new research collaborations through mentorship and compute grants resulting in a rich set of research contributions from our community members, including 9 members who have published with the affiliation Cohere For AI Community.  

An Open Science Initiative: Accelerate Multilingual Progress through AYA

The current state of NLP technology lacks representation for many languages, hindering global access. That’s why, this month, we introduced Aya—a yearlong open science initiative aimed at building a state of art multilingual generative language model that harnesses the collective wisdom and contributions of people from all over the world. This is an ambitious project aimed at releasing state of art open source datasets and models. The project is led and supported with compute and research resources by Cohere For AI. However, it is a truly multi-institutional initiative with the help of a community of researchers, engineers, linguists, social scientists, and lifelong learners from over 100 countries around the world. Contributing to Aya is open to anyone who is passionate about advancing the field and interested in supporting the under-resourced languages.

Looking Forward: Join C4AI

While we are extremely proud of the work we’ve done this year, there is much we still hope to achieve. In the next year, we aim to continue to show that top tier research can be done while changing where, how and by whom research is done. We are grateful for your support, attendance at our events, and shared enthusiasm in exploring the unknown. Thank you for all the ways you have supported us. Looking forward to the year to come.

