Why enterprises are betting on multiple AI models, not just one. - Supply Network Africa

Why enterprises are betting on multiple AI models, not just one.

AI systems are moving into the operational phase and starting to deliver business value through enhanced processes and decision-making. Central to keeping this momentum going is companies leveraging the technology for targeted workflows using models trained for specific use cases. Deploying, running and monitoring multiple models may sound like a complicated, arduous and costly exercise, but the secret to success lies in the fundamentals. Companies, whether they’re banks, hospitals, telecoms or retailers, want to use AI to solve specific business challenges. AI systems need to reflect that specificity, and the best place to start is with the models that power them.

The right model for the right job

During the last twelve months, companies have begun to rethink their approach to AI, driven by the focus shifting from model training to model inference, which means applying trained models to real-world data and use cases. Whereas previously organisations would think that deploying the single biggest, most powerful model for everything was the way to go, they now understand that tying multiple models together and routing requests to models trained for particular functions is the optimal path forward.

For example, rather than training one large language model (LLM) to identify and translate 20 different languages and then expecting a high degree of accuracy with its output, organisations train a model for each language. A planning model identifies the language and then reroutes the user’s request to the relevant small language model (SLM).

The growth of agentic AI is also propelling companies away from focusing on single, monolithic models. Without the scale and scope of highly tuned models for specific functions, agents will not be able to complete multi-step requests. Platforms like Red Hat AI accelerate the delivery of agentic AI by providing a unified API layer and dedicated user experiences that simplify the exploration, deployment, and management of agentic systems. Through support for emerging standards like the Model Context Protocol, organisations can connect their AI agents to a wide range of external tools and data sources, enabling autonomous multi-step workflows.

Through continuous experimentation, companies now recognise the value of deploying and fine-tuning multiple AI models. Most production AI systems will leverage multiple models including both traditional predictive and generative AI ones with smaller models capable of working on their own or fulfilling complementary roles for others.

The need for multiple models also encompasses factors such as digital sovereignty and regulatory compliance. Companies’ systems may be subject to unique rules regarding how and where models were trained, and in the case of agentic systems, how and where they’ll access data to complete requests. These conditions also make the case for a hybrid cloud architecture, with companies adhering to all relevant regulations and running models across on-premises and public cloud infrastructures.

Grounding AI in enterprise knowledge

Beyond infrastructure considerations, enterprises must ensure their AI models understand the business. As powerful as today’s generative AI models are, they’re primarily trained on public data and often lack the context of an organisation’s private knowledge. This is where connecting models with enterprise data becomes essential.

Red Hat AI provides integrated capabilities for connecting models to proprietary data from automated document processing to retrieval-augmented generation (RAG), which injects real-time enterprise knowledge into model outputs. Through techniques such as fine-tuning with the InstructLab toolkit and RAG, organisations can align models with their domain expertise, ensuring AI responses are grounded in factual, up-to-date information specific to the organisation’s context rather than relying solely on public training data. This approach significantly reduces hallucinations and improves the accuracy and relevance of model outputs for domain-specific use cases.

Achieving agility, flexibility and cost optimisation

A multi-model approach to AI is enough to give any CTO shivers as they think about the eventual levels of complexity, infrastructure sprawl and rising operating costs. This is often the case when organisations fail to define their success criteria or ROI expectations, and projects eventually become too expensive for production.

The goal for any AI project should be to integrate the technology into companies’ data and applications, and not the other way round. They accomplish this, along with training, deploying and managing models, using an end-to-end enterprise platform like Red Hat AI that spans all environments and is optimised for AI workloads. Built on Red Hat OpenShift, the platform guarantees portability, interoperability and consistent security across hybrid cloud environments, allowing organisations to maintain control over data, models, and lifecycle management while scaling production AI deployments.

Meanwhile, in addition to using purpose-built SLMs, companies can decrease the computational demands of LLMs via model compression without compromising their performance. Red Hat AI’s LLM Compressor enables organisations to apply various compression algorithms to reduce model size—achieving up to 3.3X smaller models with up to 2.8X better performance whilst maintaining 99% accuracy recovery, as demonstrated with compressed Granite 3.1 models. This optimisation frees developers up from having to deal with managing infrastructure demands and lets them focus on achieving their success criteria.

Efficient inference is equally critical for managing operational costs at scale. Red Hat AI Inference Server optimises model inference across the hybrid cloud with vLLM at its core, maximising throughput whilst minimising latency. The platform’s llm-d distributed inference framework provides predictable, scalable performance through intelligent workload distribution, helping organisations consistently meet service level objectives across any generative AI workload. This combination of sophisticated inference runtime and distributed orchestration helps reduce inference costs whilst effectively managing compute power and delivering accurate responses.

Companies can also leverage the power of open source to optimise the cost and performance of their LLMs. Red Hat’s approach combines community-powered innovation with enterprise-grade support and indemnification. Organisations gain access to cutting-edge open source technologies like vLLM alongside validated and optimised models from the Red Hat AI repository on Hugging Face—including full model details, SafeTensor weights, and performance benchmarks—all backed by Red Hat’s support and certified partner ecosystem.

It’s all about making the right choices

Like traditional software engineering, the best approach to AI is by breaking the problem down into smaller pieces and evaluating how best to solve it. As the technology becomes more prevalent in how we complete work assignments, interact with healthcare and financial service providers, or even order our groceries or plan holiday trips, that evaluation process becomes all the more critical, as does the need for models with specialised functions and use cases.

Companies’ AI ambitions will not be fulfilled by relying on singular LLMs. Doing so would be to limit their ability to adapt and innovate. With a multi-model strategy powered by platforms like Red Hat AI, companies can select and scale the right tools for the right challenges. Red Hat AI provides the control and consistency required to build, deploy, and manage both predictive and generative AI models wherever it makes most sense—with flexibility across models, hardware, and environments whilst maintaining operational efficiency and governance. And with the help of platforms that guarantee flexibility and provide valuable system insights, they can set themselves up for meaningful, accurate success.

Robbie Jerrom, Senior Principal Technologist AI at Red Hat.
Scroll to Top