Promises, Pitfalls, and Limits of
AI In Manufacturing

The Rapid Rise of AI in Manufacturing

AI is moving so quickly that this article might be outdated in six months. But that’s exactly why manufacturers are paying attention. With rising costs and mounting pressure to stay competitive, AI promises a way to do more with less.

But there’s a catch. Most companies experimenting with AI are running into the same issue. The most common AI models, large language models, are designed to always give you an answer, even if that answer is wrong. They’re great at sounding confident, even when they’re guessing. And in manufacturing, that kind of behavior can wreak havoc.

It’s easy to think generative AI has some level of intelligence or reasoning. But in reality, it’s just high-powered word pattern prediction, not true understanding. That gap matters when you’re applying it to real-world processes.

In this article, we’ll look at where AI can fall short in manufacturing and how to minimize the risks while still getting value from it.

Part I – Limitations of AI in Manufacturing

Part II – Responsible Applications of AI

Part I – Understanding the Limitations of AI in Manufacturing

Before diving into use cases, it’s important to understand where current AI technologies fall short. The following sections highlight key limitations that can lead to costly errors or unsafe outcomes if not properly addressed.

Temporal Misalignment

Temporal misalignment happens when an LLM’s training data is outdated or no longer reflects current conditions. Since these models learn from historical data, they can produce misleading or incorrect results if manufacturing requirements, market conditions, or operational priorities have changed since the model was last trained.

Updating the training data regularly can reduce the risk, but there will always be situations where change happens faster than the model can keep up. A good example is the COVID-19 pandemic: if an LLM was trained before March 2020, it would have no knowledge of the global shutdowns, supply chain disruptions, or economic volatility that followed. Asking it to predict economic growth post-2020 using that outdated perspective would produce an answer based on a reality that no longer exists.

Sliding Context Windows

Large language models (LLMs) process information using tokens, which are chunks of text such as words, punctuation, and symbols, and they have a hard limit on how many tokens they can handle at once. The bigger the context window (i.e., how much information the model can “see” at one time), the more computing power it takes. To keep things efficient, those windows are capped, but this means the model cannot “remember” everything at once.

When working with long documents, complex production reports, or extended conversations, the model may lose track of important context as older tokens fall out of view. This can lead to inconsistencies, duplicated information, or responses that don’t connect the dots, especially in situations that require tying together dates, instructions, or historical performance.

In manufacturing environments, where precision and consistency matter, the LLM is best treated as an assistant rather than a source of critical guidance. Even then, it should never be relied on for decisions involving safety, compliance, or quality.

LLMs should not be used as guidance for any critical items that involve safety, quality, or compliance.

Ways to mitigate sliding context window issues include:

Retrieval-Augmented Generation (RAG): Pull in relevant chunks dynamically from a vector database.
Memory-augmented agents: Store and recall summarized or structured knowledge across turns.
Extended context models: Use LLMs with larger token windows (like GPT-4 or Claude with 100k+ token support) to reduce fragmentation.

Hallucinations

Hallucinations occur when an LLM gives you an answer that sounds confident, but is completely wrong. This isn’t a bug; it’s a consequence of how these models are designed. LLMs don’t pull from a database of facts or perform logical reasoning. Instead, they generate responses by predicting the most likely next word or token based on patterns in their training data.

A common misconception is that LLMs can perform calculations, but they can’t do this on their own. Most providers get around this by adding tools like calculators or data connectors behind the scenes. For example, if a student asks for the square root of 8.54, the LLM doesn’t do the math itself; it passes the request to a calculator. The same goes for manufacturing metrics: if you ask for your OEE, the model might know how OEE is usually calculated, but it doesn’t know your production history, your actual data, or the specific version of the calculation your operation uses.

While hallucinations can’t be eliminated entirely, there are strategies to reduce their impact, especially when applying AI in high-stakes manufacturing environments:

Retrieval-Augmented Generation (RAG): Fetch facts from a trusted database or calculator (tool in LLM terms). A tool is an external capability that the LLM can call to do something it can’t do on its own (like calculate OEE, quality, predictions, or other manufacturing information). Tools are passive, meaning the LLM decides when and how to use them based on user prompts or internal logic.

Even with tools in place, the transparency of the source data is still an issue. For a simple example, asking AI what the sales are for this month, is its answer derived from a tool based on customer purchase orders received, invoiced sales, or received payments? The LLM might not know which data the tool is using for calculations, and it is not transparent to the user.

To simplify the connectivity of tools, the MCP (Model Context Protocol) that Anthropic spearheaded provides a universal, open standard for connecting AI systems with data sources.

Examples of LLM tools:

A calculator tool for math.
A web search tool for live information.
A code interpreter or database query tool.
Calculate OEE for a given piece of equipment and date range.
Predict production losses for the current run.

Post-generation Fact Checking: Use tools directly (not through the LLM) to verify answers after generation.

User warnings or confidence scoring: It would be great if LLMs let us know the confidence level of their answers, but unfortunately, they don’t.

Caching

Caching is often overlooked but just as important to address as hallucinations. It occurs when an LLM reuses a previous response instead of fetching fresh data from a connected tool. This can result in outdated or stale information, leading to bad decisions, especially in environments where real-time accuracy matters.

In manufacturing, where conditions change constantly, relying on cached data can introduce serious quality or production risks.

Fortunately, most LLMs support system-level instructions, called system messages, that influence how they behave. You can explicitly tell the model to always call the tool for live data, rather than defaulting to a cached response. This extra step helps ensure answers reflect the most current state of operations.

Source Data

LLMs operate in two phases: a training phase (where they learn from data) and an inference phase (where they generate responses). The quality of those responses, whether from an LLM or a predictive machine learning model, depends entirely on the quality of the data it was trained on.

There are two main types of machine learning:

Supervised learning, where the model learns from labeled data. For example, to train a spam detection model, each email in the dataset would be labeled as spam or not. The model learns to associate patterns in the email content (called features) with the correct label.
Unsupervised learning, which uses data without labels. The model tries to find hidden patterns, such as grouping similar production events or identifying anomalies in process behavior. It’s often used to simplify data or surface unexpected trends.

In both cases, the training data is everything. If the data doesn’t accurately represent real conditions, or lacks context, the patterns the model learns will be flawed. That leads to bad predictions, incorrect answers, and poor decisions.

In manufacturing, that’s not just inconvenient. It’s unacceptable.

It’s tempting to think AI can take in all available data, good or bad, and magically optimize your processes. But that’s not how it works. The most important step in applying AI to production is making sure the data is clean, contextualized, and truly representative of how your operation runs.

Agentic AI

Agentic AI takes machine learning and large language models a step further by allowing systems to act autonomously to achieve specific goals. Unlike AI that only analyzes data, agentic AI can monitor processes and initiate actions on its own. These actions range from simple tasks like monitoring line production and sending alerts, to more advanced ones like placing orders automatically.

But autonomy comes with risk. AI agents can act on outdated patterns or incomplete data. Without human judgment, there is no cognitive context or gut instinct. They may pursue a goal in technically correct but operationally harmful ways. An action might satisfy a metric while creating real-world disruption.

AI agents can act on outdated patterns or incomplete data. Without human judgment, there is no cognitive context or gut instinct. An action might satisfy a metric while creating real-world disruption.

As with RAG and other tools, transparency is a major challenge. Many agentic systems, especially those powered by deep learning, are black boxes. When something goes wrong, it’s hard to trace back what happened, or why.

The safest approach is to treat Agentic AI as a recommendation engine, not an execution engine. Reserve autonomous actions for low-risk tasks. For anything related to safety, quality, or compliance, Agentic AI should be limited to making recommendations that require human approval before implementation. For example, the AI agent can recommend placing an order when stock runs low, but the order should still be reviewed and approved by a human before it’s submitted.

Security

Manufacturers are rightfully concerned about data privacy, and that’s one reason you don’t see broad, industry-wide LLMs being used in the manufacturing space. Most implementations will involve private, company-specific LLMs and machine learning models tailored to internal systems and protected data.

Security isn’t a new concept for manufacturers. Most have already made significant investments in securing their digital infrastructure, and those same principles apply when introducing AI. While this article doesn’t dive into cybersecurity practices in detail, AI brings an additional risk worth highlighting: model poisoning.

Model poisoning occurs when attackers inject malicious or misleading data into a model’s training set. If successful, the model may learn incorrect patterns or behaviors, leading to subtle (or not-so-subtle) disruptions in quality, scheduling, or workflow decisions.

As AI adoption increases, maintaining data integrity becomes just as important as data security.

High Expectations

Comfort levels with AI vary widely in manufacturing. Some view it as a potential disruptor; others see it as the best thing since sliced bread. Like most emerging technologies, AI is still climbing the hype curve and hasn’t yet reached the so-called plateau of productivity. That stage will come with time, along with more reliable best practices for implementation. What complicates matters is that AI is in a constant state of new advancements, each at different points along their own hype cycles.

When expectations are too high, it’s easy to overinvest and underdeliver. Manufacturers who go all in on AI without clear focus often struggle to realize a meaningful return. A better strategy is to start small: identify a few targeted areas for improvement, measure results, and be willing to pivot when things don’t go as planned. Once progress is made, the same approach can be applied to broader initiatives, taking it one step at a time.

Part II – Responsible Applications of AI

Use Cases

AI is not the answer to solve all problems, at least not today. But it can be a useful assistant for improving efficiency. In software development, for example, AI can generate code from natural language descriptions or requirements, saving developers hours or even days. It is just as effective for testing software. The caveat is that the code isn’t always perfect. It needs to be reviewed and refactored by a human. That’s fine in software, where code can be iterated without real-world consequences. Mistakes don’t affect safety or quality, and the time savings often outweigh the risk.

Manufacturing is different.

Rework comes at a cost, whether it’s scrapped materials, lost production time, or added strain on machines and personnel. It can lead to disruptive chaos in an otherwise smooth production flow. That said, AI can still play a supporting role. It’s well-suited to identifying inefficiencies or spotting patterns humans might miss. In dynamic environments, AI can quickly surface relevant information when unexpected disruptions occur, helping teams respond faster and smarter.

In software, mistakes can be fixed without real-world consequences. Manufacturing is different. Rework comes at a cost, whether it’s scrapped materials, lost production time, or added strain on machines and personnel. It can lead to disruptive chaos in an otherwise smooth production flow.

Natural Language Processing

Learning how to use a software program doesn’t come naturally to everyone, and becoming proficient often requires a significant time investment. Natural language processing (NLP), a key capability within AI, helps reduce that barrier. Instead of memorizing commands or navigating complex menus, users can interact with systems in their own words to retrieve production information or perform non-physical tasks.

NLP can also analyze sentiment, the tone or emotional content behind textual data. For instance, the sentiment of operator notes can reveal whether frustration is building or if production is running smoothly. When combined with downtime, production counts, and quality metrics, this additional context can help identify issues that aren’t detectable through sensors alone. It can uncover problems like understaffing, inadequate training, or poor team dynamics. Likewise, it can highlight practices worth reinforcing, indicated by a consistently positive sentiment.

Shift reports, operator notes, test results, maintenance logs, and error reports written in natural language can all be analyzed to detect patterns and predict production losses.

Here are a few ways NLP can support manufacturing operations:

Analyze quality reports for compliance
Predict maintenance needs using operator and maintenance logs
Retrieve information from equipment manuals using natural language queries
Generate impromptu reports on demand
Enable hands-free MES interaction through voice commands

Supply Chain Optimization

Determining inventory levels, aligning shipping with production, managing alternative suppliers, and optimizing shipping vendors and routes is incredibly complex. When just one of these elements falls out of sync, it often triggers a cascade of issues: rescheduled production runs, missed deliveries, idle workers, or frustrated customers. Even the best-laid schedules can be disrupted, and each change introduces new chances for delays, mistakes, and inefficiencies.

Anticipating demand ahead of time, based on sales trends, market signals, weather, seasonal patterns, or major events, is more of an art than a science. Done poorly, it leads to shortages or overstock, both of which reduce efficiency and increase cost.

Machine learning can help by identifying complex patterns and relationships that may not be obvious to humans. But it only works if the model is trained on the right data. For example, if an ML model sees a sudden spike in sales but doesn’t have access to the external factors that caused it, like a holiday, a marketing promotion, or a weather-related surge, it treats the spike as random noise. Without understanding the reason behind the change, the model can’t learn from it. Future predictions suffer because the system sees variability without context, failing to connect cause and effect.

When the training data is clean, complete, and conditions remain relatively stable, predictions tend to be more reliable. But when conditions shift, such as during a global pandemic, those patterns break. Demand for exercise equipment and bicycles skyrocketed, while auto sales plummeted. That, in turn, triggered another unpredictable chain reaction: a shortage of automotive semiconductors, which drove up prices for used vehicles. These types of disruptions are difficult for ML to account for unless the underlying causes are captured in training data.

Scheduling

Scheduling production is a complex, constraint-heavy, and often highly dynamic problem. In most manufacturing environments, scheduling has two layers. The ERP system typically consolidates customer orders and accounts for available inventory and resources, generating a schedule, often just once per day. In some cases, the ERP specifies which equipment or lines to use; in others, it leaves those decisions to the production team.

But once the production orders hit the shop floor, real-time factors take over and daily schedule updates quickly fall apart.

Some scheduling scenarios are straightforward, but others are shaped by the Theory of Constraints, and those constraints aren’t always obvious. The concept centers on the idea that one resource, machine, or process often limits the entire system’s throughput, and until that bottleneck is addressed, nothing else truly improves. The number of potential combinations between resources (equipment, personnel, tooling, etc.) and products makes it difficult to find the optimal plan without help.

Machine learning is well-suited to tackle this kind of complexity. It can evaluate a wide range of possibilities and optimize schedules to meet due dates, prioritize critical orders, reduce energy usage, and improve overall efficiency, faster and more consistently than manual methods.

Quality Control

Statistical Process Control (SPC) has been used for decades to monitor manufacturing processes and detect when something goes wrong. It relies on established rules, such as the Western Electric and Nelson Rules, that flag when a process is behaving abnormally based on simple statistical analysis of numeric sample data.

Developed more than half a century ago, those rules are dated and limited in what they can see. Their simplicity hinders their ability to catch subtle trends or emerging quality issues early.

Machine learning offers a chance to modernize and extend SPC. Unlike traditional SPC, ML can be trained on a wider range of data. This includes not only numbers but also text, images, and other contextual information. It can detect more complex patterns and predict potential quality problems before they cause a process to drift out of control.

That doesn’t mean SPC should be replaced. These traditional rules are well understood, easy to implement, and still effective in many cases. Machine learning can complement them by identifying signals that standard control charts might overlook, especially when working with large volumes of varied data. By layering ML on top of SPC, manufacturers can improve early detection, reduce false positives, and shift from reactive quality control to a more predictive, proactive approach.

Predicting Inefficiencies

For any manufacturer, efficiency is the bottom line. Over time, if the cost of producing finished goods consistently outweighs the sales price, the business becomes unsustainable. Recognizing the factors that lead to inefficiencies, such as downtime, waste, or delays, can support better decision-making. Taking that a step further, forecasting when and where those inefficiencies are likely to occur allows teams to take proactive steps to avoid them.

On the human side, machine learning can highlight situations that increase the risk of operator error, such as assigning newly hired staff to complex lines without sufficient training. When these risks are surfaced early, manufacturers can intervene with targeted support or adjust assignments to maintain quality and productivity.

The challenge, as with any ML application, lies in the quality of the training data. The model can only detect patterns if the relevant variables are included. If something like humidity affects production, that factor must be present in the data set. Incomplete or inaccurate data will lead to poor predictions, which in turn lead to poor decisions.

Advanced Process Control

Automated systems that adjust process setpoints, such as temperature, pressure, level, or speed, on the fly have been used in manufacturing for decades. These systems, often referred to as Advanced Process Control (APC), are typically designed for specific use cases. For example, an APC system might tweak process settings to maintain consistent material density based on changes in humidity.

This should not be confused with traditional feedback control (like PID controllers), which adjust a process variable such as temperature or pressure to stay as close as possible to a fixed setpoint. PID control works by continuously measuring the difference between the actual value and the target, then adjusting the output to reduce that gap over time. For example, a PID loop might keep an oven at a steady 160 degrees. In contrast, APC adjusts the setpoint itself to achieve a desired outcome, such as improving product quality, optimizing throughput, or reducing energy use.

Because APC is highly tailored and used in critical applications, it is not an ideal candidate for AI-driven control. While machine learning can be used to analyze trends or predict future behavior, it is trained on historical data. When conditions shift or new scenarios emerge, the model’s predictions may become unreliable. This is particularly true when those scenarios are not represented in the training data. In high-risk, tightly controlled environments, that uncertainty is difficult to justify.

Predictive Maintenance

Determining when to perform maintenance on a machine, or even on specific components, can be complex. A fixed schedule, like servicing equipment every three months, might be too frequent for lightly used machinery or too infrequent for heavily used ones. Over-maintenance leads to unnecessary downtime, labor costs, and part replacements. Under-maintenance increases the risk of unexpected breakdowns, lost production time, and expensive repairs.

AI can help find the right balance by basing maintenance schedules on actual usage and performance. Machine learning models can identify patterns between past maintenance and production outcomes to suggest more accurate timing. Large language models (LLMs) can also analyze maintenance manuals and technical documents to extract suggested procedures and intervals.

Robotic Automation

Robotic automation has transformed certain aspects of manufacturing, particularly in environments with repetitive, high-volume tasks. Robots perform well at well-defined jobs such as assembling identical components on a production line, but their usefulness drops sharply when tasks become more variable or physically intricate.

Fully autonomous “lights-out” manufacturing, where no human intervention is needed, has been an industry aspiration for decades. Despite some progress, it remains impractical for most manufacturers. Many tasks, such as feeding wires through tight channels or securing delicate components, still require flexibility, dexterity, and judgment that robots struggle to replicate, especially at a justifiable cost.

AI and machine learning are beginning to change that. New AI-powered tools can accelerate robot programming, allowing robots to learn tasks through demonstration or natural language input. Over time, these tools may reduce the time and expertise needed to deploy robotic automation, making it more accessible for a wider range of use cases. But today, most implementations still require significant customization and engineering effort.

Rather than replacing human workers entirely, the near-term role of AI in robotic automation is to augment them. It can speed up programming, improve task recognition, and support smarter collaboration between people and machines.

Scenario Testing

Testing is a major challenge for business and production systems. It’s difficult to simulate full production load and cover every possible scenario. For example, a Manufacturing Execution System (MES) may need to support hundreds of simultaneous users during live production. Simulating that load, along with the countless combinations of production activities, is no simple task. Yet these tests are critical to ensure the system works reliably and doesn’t disrupt production.

Because of the complexity involved, testing often happens on the same systems that control real production. When pre-production environments aren’t fully equipped or representative, untested changes may be pushed into production, increasing the risk of unexpected issues.

AI can help by simulating large numbers of users performing a wide range of tasks. It’s also highly effective at generating varied test data and edge-case scenarios. Manually creating this kind of test coverage is tedious, time-consuming, and often incomplete. With AI, testing can be broader, faster, and more representative of real-world conditions.

Administrative Tasks

Many administrative tasks follow repeatable procedures, such as sorting records, classifying inputs, performing routine calculations, or generating reports on a fixed schedule. These processes are often time-consuming and prone to human error, yet they rarely involve physical activity or require high-stakes decision-making. That makes them strong candidates for automation through AI.

Because these tasks typically carry low risk, they’re well suited for early AI adoption. Even basic AI-driven automation can save time, reduce repetitive work, and allow staff to focus on more complex or judgment-based responsibilities. Over time, AI agents may also learn to flag inconsistencies, identify data entry anomalies, or surface insights that might otherwise go unnoticed.

As long as a task isn’t operationally sensitive and doesn’t require physical interaction, it’s worth evaluating whether an AI agent can take it on.

Document Retrieval

Manufacturing environments generate and rely on an enormous number of documents: equipment manuals, standard operating procedures, compliance guidelines, technical specifications, maintenance records, and more. When someone needs to find the right information quickly, it’s not always easy. Sifting through PDFs, spreadsheets, and printed binders can slow things down, and missing a critical detail can lead to costly mistakes, delays, or compliance issues.

Large language models (LLMs) are well suited to this challenge. They can index large collections of documents and return relevant information in response to natural language questions. This makes it much easier for personnel to find what they need without manually digging through folders or files.

But there’s a warning: LLMs can generate plausible-sounding but incorrect answers (“hallucinations”). To reduce this risk, it’s essential that LLMs are configured to always cite their sources. When responses include original document references, users can verify the information and build trust in the results. Setting this behavior at the system level is a key safeguard for using LLMs responsibly in high-stakes environments.

Conclusion: Start Small, Stay Grounded, and Build Toward Value

AI in manufacturing holds enormous promise, but that promise is only realized when implementations are thoughtful, grounded, and aligned with real business needs. As we’ve seen, machine learning and AI agents can support everything from predictive maintenance and scheduling optimization to document retrieval and administrative tasks. But success depends on more than just technology. It requires clean, relevant data, clear goals, and a healthy understanding of the risks involved.

The best approach is to start small. Focus on one or two use cases where the payoff is clear and the risk is low. Use those early wins to build internal confidence, refine your processes, and develop best practices. As the technology matures, and as your team’s understanding grows, you’ll be in a better position to expand.

AI won’t replace people, but it can reduce busywork, highlight unseen problems, and help teams make better decisions faster. Used responsibly, it’s not just a tool for automation. It’s a tool for amplifying human capability.

We Want to Hear From You

AI is an emerging technology that is evolving rapidly and surrounded by considerable hype and uncertainty. We recognize that experiences with AI implementations vary widely across manufacturing environments. Whether you have successfully integrated AI solutions or encountered challenges along the way, your insights are invaluable.

We invite you to share your real-world use cases, lessons learned, and strategies for mitigating risks. Likewise, if concerns or uncertainties are holding you back from adopting AI, we want to understand those as well. Your feedback will help shape better practices and drive meaningful progress in the application of AI in manufacturing.

Please reach out to us to join the conversation.

Share Your AI Feedback

About Sepasoft®
Sepasoft has been a leader in modular MES solutions for years, helping manufacturers optimize production with tools to improve efficiency, traceability, quality, and production control. Building on this foundation, we introduced SepaIQ, a central MES data hub and advanced analytics platform that transforms raw production data into contextualized, structured information for enterprise systems, BI, and AI. SepaIQ serves as a launchpad into AI, providing manufacturers with predictive insights, natural language data interaction, and the ability to uncover patterns that drive smarter decisions.

Learn more about the SepaIQ here: https://www.sepasoft.com/products/sepaiq/.

Have Questions? Reach out to us: sales@sepasoft.com

Promises, Pitfalls, and Limits of AI In Manufacturing

Promises, Pitfalls, and Limits of AI In Manufacturing

The Rapid Rise of AI in Manufacturing

In This Article:

Part I – Limitations of AI in Manufacturing

Part II – Responsible Applications of AI

Part I – Understanding the Limitations of AI in Manufacturing

Temporal Misalignment

Sliding Context Windows

Hallucinations

Caching

Source Data

Agentic AI

Security

High Expectations

Part II – Responsible Applications of AI

Use Cases

Natural Language Processing

Supply Chain Optimization

Scheduling

Quality Control

Predicting Inefficiencies

Advanced Process Control

Predictive Maintenance

Robotic Automation

Scenario Testing

Administrative Tasks

Document Retrieval

Conclusion: Start Small, Stay Grounded, and Build Toward Value

We Want to Hear From You

Promises, Pitfalls, and Limits of
AI In Manufacturing