Choosing the right large language model (LLM) for your business requires balancing scale and specificity. Large models deliver strong, out-of-the-box performance for a wide range of applications. They have better reasoning capabilities, but at higher costs, while smaller, specialized models can be cheaper, but hard to train and maintain for a large number of use cases.
Navigating this choice involves assessing top-tier LLMs hosted by cloud services, such as Amazon Web Services (AWS) and Oracle Cloud Infrastructure (OCI), each with unique capabilities and supporting infrastructure. Enterprise solutions like retrieval-augmented generation (RAG) and model customization enable AI integration with domain-specific knowledge, producing relevant and accurate responses. However, selecting the ideal combination of LLM and infrastructure is challenging, with factors like cost, security, and performance playing key roles.
Many companies experimenting with AI have created proof-of-concepts (POCs) using large models, only to find that these models often can't be scaled economically for their intended business goals. As a result, project teams are now seeking more efficient and affordable solutions for full-scale production.
This article provides a concise framework to guide business leaders through these options. We’ll walk you through how to evaluate your company's requirements, explore the available models and their features, choose the right sourcing option, and consider how to scale the solution. These steps will help you identify the most suitable LLM solution for your enterprise.
Define Your Needs
To effectively leverage AI in your business, you’ll first need to identify the specific problem you want to solve. Are you aiming to enhance business intelligence with an AI chatbot, speed up digital transformation using AI-generated code, or improve your enterprise search capabilities with a multilingual embedding model? Once there's a clear understanding of your needs, you can then map out a strategy that includes AI solutions tailored to those needs.
Begin by pinpointing one or more clear, task-oriented use cases that address your problem. Each use case may need different types of LLMs to achieve the desired results. For instance, generative models aren't necessary for all scenarios. Take customer feedback analysis as an example — a simple text classification task. In such cases, a fine-tuned embedding model is often a more efficient and cost-effective choice compared to a generative model. See below for an explanation of the various types of LLMs.
Types of LLMs
Initially, focus on establishing clear goals and metrics as this will guide the selection of the most suitable model type, its performance level, size, and necessary security measures. This may not be a simple exercise, and there are many tradeoffs to consider. For example, determining the level of performance of the end solution can have big implications to ROI and model choice.
For certain use cases that do not require complex reasoning, often smaller models, possibly with some adaptation using fine-tuning, are the preferred solutions due to factors like lower latency and cost. On the other hand, for use cases that require complex reasoning over a broad range of topics, a larger model augmented with information retrieved dynamically can be more accurate and worth the higher cost.
Choose the Right-Sized Model
LLMs vary, not only in type, but also in size. A common and easy way to compare model sizes is by the number of parameters they contain. Parameters are the internal variables and weights that significantly impact model training. For example, larger models with more parameters (>50bn) are considered generally more powerful and capable of more complex tasks, but they also require more computational resources to run.
While the number of parameters is a useful indicator of model size, it doesn't completely represent the overall size of the model. To do that, you’d have to take into account the supporting architecture, training data (e.g., volume, variety, and quality), optimization techniques (e.g., quantization), transformer efficiencies, choice of learning frameworks, and model compression techniques. The nuances of different models and how they are created make it difficult to do straightforward, like-for-like model comparisons.
For simplicity, below is a rough comparison between smaller sized models and larger models.
LLM Comparison by Size
Review LLM Sourcing Options
Type and size are not the only factors to consider when comparing models. Once several projects have been identified, businesses will have to decide how they’ll build or source the LLMs that underpin the application. Language models use statistical algorithms trained on massive volumes of data to understand, summarize, generate and predict text-based language. Building and training a high-performing model can cost upwards of $100 million.
Businesses have three main options to source an LLM:
1) Develop an LLM from scratch in-house, utilizing either on-premises or cloud computing resources.
2) Opt for pre-trained, open-source LLMs available in the market.
3) Employ pre-trained, proprietary LLMs through various cloud services or platforms.
Most organizations lack the necessary in-house expertise, funding, or specific needs to justify building an LLM from scratch, an option that can be prohibitively expensive. Therefore, for most, options 2 and 3 have emerged as the more practical and efficient way to source and train LLMs.
However, comparing open-source versus proprietary models can feel daunting, particularly when you consider the supporting infrastructure you’ll need to build an AI application at scale. Open-source tools may appear at first to be a good low-cost option, but when considering all the criteria to implement, launch, and support them, the picture is less convincing.
To better understand the differences between open-source and proprietary models, we recommend evaluating multiple criteria in addition to upfront costs. This includes time-to-solution, data provenance and indemnity options, the level of support, and the frequency of updates made to the models. Take a look at the table below that outlines the key differences.
Comparison of Open-Source Versus Proprietary LLMs
For enterprise applications that require more stringent security and transparency, pre-trained proprietary LLMs that can be accessed through a range of APIs, via partner cloud networks, or deployed directly on-premises, will likely be the best option. These models enable faster implementation and provide advanced capabilities, such as state-of-the-art fine-tuning and RAG, to meet a wide range of enterprise needs.
Consider How the Solution Scales
To shift from POCs to full production, businesses need to understand how scaling LLM applications impacts costs, performance, and ROI. It's crucial to consider not just the model type, size, and sourcing options, but also the supporting infrastructure and model serving capabilities. This comprehensive approach is vital for meeting your initial goals and may require adjusting strategies to align with your objectives.
- Requirements: Start by examining the data residency requirements for your application. These requirements mandate that certain types of data collected from individuals or organizations within a specific country or region must be stored and processed within that same country or region. Understanding these requirements can help you determine whether a multi-tenant, hosted, API-based solution works for you, or whether a more secure solution with stronger isolation is an important requirement.
- Scope: Then, estimate the volume and traffic you intend to serve, and the impact of that traffic on different solutions. For example, most hosted APIs have rate limits that limit the number of concurrent requests that they handle for each user. This can significantly impact end-user experience and usability.
- Skills: Next, identify the skills needed to implement a preferred solution and ensure that skillset is available in-house. Skills can range from prompt engineering to training a model from scratch. A combination of upskilling and recruitment is likely needed depending on the chosen solution.
- Costs: With an understanding of the resourcing needed to develop and maintain a solution, begin to estimate the end-to-end cost of different solutions, as well as the relative pros and cons of these approaches. For example, when dealing with data that changes often, using a model that enhances its responses by looking up relevant information (retrieval-augmented generation) is usually the top choice. However, this approach comes with the added task of setting up and updating a search and retrieval system to find the right information, on top of managing the main generative model. On the other hand, adapting a pre-trained model for a specific area or purpose involves initially gathering the right data for adjustments and then training with this data to fine-tune the model. This upfront investment might pay off in the long run if a smaller, tailored model can match the performance of a larger one.
- Operations: Finally, determine your operational capability both in terms of cost and engineering know-how. This will help you evaluate whether to run AI deployments with your own resources ("in-house") or use externally managed services. Managed platforms like AWS Bedrock or OCI Gen AI, or multi-tenant API solutions, provide environments that offer a range of additional security and performance benefits. The decision hinges on balancing the internal capabilities and costs against the benefits of specialized external services.
Ultimately, don't let choosing the right AI model slow you down, but do take the time to peek under the hood and understand the cost, performance, and risks of this emerging technology, so you can confidently deliver outsized value from your LLM application.
About the Authors
Sudip Roy is Cohere’s Director of Inference and Fine-tuning
Neil Shepherd is Cohere’s Head of Growth