Why We Built ArgusAI: The Decision to Bring AI On-Premises for Industrial IoT

Where It Started

ArgusIQ shipped with Ask Argus as a core module. The design was straightforward: a natural language interface that queries ArgusIQ’s data model through MCP servers, sends the query and context to a cloud LLM for inference, and returns the answer with source citations.

The MCP server architecture was already right. Each ArgusIQ module had a purpose-built MCP server — asset_hub_mcp for asset data, cmms_mcp for maintenance records, alarm_mcp for alert history. The LLM never touched a database directly. It received structured context and generated answers from that context.

What changed was where the LLM ran.

The Questions We Didn’t Anticipate

Ask Argus launched to strong reception. Operations teams who had spent years staring at dashboards full of data they couldn’t easily query suddenly had a natural language interface to the full operational picture. The use cases were immediate: status queries, maintenance planning, failure pattern analysis, compliance documentation.

Then the enterprise customers started asking different questions.

“Does Ask Argus send our operational data to an external server?”

Yes — the query, the retrieved context from MCP servers, and the generated response all transited an external LLM API.

“We can’t do that.”

The customers raising these concerns weren’t a small edge case. Defense manufacturing. Regulated utilities. Pharmaceutical production. Classified government operations. These are large, high-value customers with exactly the kind of complex operations where Ask Argus’s capabilities would provide the most value — and exactly the customers for whom cloud AI wasn’t viable.

What “On-Premises AI” Actually Means

The phrase “on-premises AI” covers a spectrum. At one end: a company decides to run an LLM on its own servers rather than pay for API access. That’s a cost decision. At the other end: a defense contractor operating a classified program must ensure that no production data transits any network outside the facility perimeter. That’s a security requirement.

ArgusAI was designed for the latter end of the spectrum — and if it works for the strictest requirements, it works for everything in between.

On-premises AI, done correctly, means:

The LLM runs on hardware inside the facility network. Not in a private cloud. Not in a VPC with VPN connectivity. LLM runs on hardware inside the facility perimeter.

The MCP servers run inside the facility network. The bridge between the LLM and the operational data — the components that query ArgusIQ and format context for the model — also run inside the perimeter. No query data leaves the network at any point in the pipeline.

Model weights are loaded once. After initial deployment, the system requires no external connectivity. Model weights don’t “phone home.” Inference happens entirely from local resources.

The inference endpoint is internal only. Ask Argus in ArgusIQ queries an internal API endpoint — the ArgusAI inference server running on local hardware — not an external cloud API. From ArgusIQ’s perspective, it’s calling a local service. The operational architecture is identical to cloud deployment; only the endpoint address changes.

The Technical Architecture Decision

Building on-premises AI required resolving several architectural questions that cloud AI deployment sidesteps:

What model do you run? Cloud AI deployments can use frontier models — GPT-4, Claude, Gemini — that require infrastructure no enterprise deploys on-premises. On-premises deployment requires open-weight models that can be hosted on practical hardware. We evaluated the available open-weight instruction-tuned models across the 7B to 70B parameter range and identified viable tiers: 7B models for hardware-constrained environments, 13B–34B models for standard production deployments, 70B models for enterprise deployments with demanding analytical workloads.

How do you make the LLM scale? Cloud AI APIs handle concurrency internally. On-premises deployment requires an inference serving layer. We chose vLLM — its paged attention architecture provides efficient concurrent request handling on GPU hardware, and it’s the production-grade open-source inference engine with the strongest performance characteristics for the model families we support.

How do you handle hardware variability? Defense contractors have different compute budgets from mid-market manufacturers. Pharmaceutical facilities have different GPU hardware from utilities. ArgusAI needed to work across a hardware range — from single A10 GPU configurations for smaller deployments to multi-GPU H100 clusters for enterprise scale. The tiered model selection (7B through 70B, with INT8 and INT4 quantization options) gives deployment architects the right configuration for their hardware.

graph LR A[User Query] --> B[Ask Argus UI] B --> C[ArgusAI Inference Server] C --> D[MCP Servers] D --> E[ArgusIQ Data] C --> F[Local LLM Runtime]

Scroll to see full diagram

The MCP Server Architecture Already Fit

One fortunate aspect of the ArgusAI development: the MCP server architecture we had built for cloud Ask Argus deployment was already correct for on-premises.

The MCP servers were already the boundary layer. They queried ArgusIQ’s data model and returned structured context. They didn’t hand off raw database access to the LLM. The LLM received formatted context — the results of intentional queries — not unmediated access to operational data.

This architecture was originally a design choice about keeping the LLM grounded in actual data rather than hallucinating from training knowledge. It turned out to also be the architecture that makes on-premises deployment secure: the MCP servers enforce the same RBAC that governs ArgusIQ user access, so the AI never accesses data outside the scope of the authenticated user’s permissions.

Moving from cloud to on-premises deployment changed one thing: the inference endpoint address. MCP servers, context formatting, response structure, citations — all unchanged. ArgusAI plugs in where the cloud LLM API previously terminated.

What We Discovered About the Market

Building ArgusAI forced us to have detailed conversations with customers about their AI deployment constraints. Those conversations produced some findings we hadn’t expected.

The air-gap requirement is more common than the market discussion suggests. The public conversation about enterprise AI adoption focuses on cloud-first architectures. The operational reality for a significant segment of industrial customers is that their operational technology networks are intentionally isolated from external connectivity — not due to budget limitations, but as a deliberate security architecture. These customers can’t use cloud AI regardless of budget. They need on-premises solutions or they get no AI.

Data residency requirements are growing. Pharmaceutical manufacturers operating under FDA oversight face increasing attention on data residency for production system data. Utilities operating under NERC CIP face cybersecurity requirements that may complicate OT-to-cloud data flows. Defense contractors face ITAR and program-specific security requirements. The regulatory trajectory suggests more organizations will face on-premises requirements in the future, not fewer.

The capability gap between cloud and on-premises is closing. In 2023, the performance gap between frontier cloud models and available open-weight models was significant. By 2025, the 70B-class open-weight models deliver performance competitive with frontier models on structured operational query tasks — which is the specific task domain Ask Argus operates in. The days of “cloud AI is 5x better than on-premises AI” are largely past for operational intelligence use cases.

The Decision Framework

When evaluating whether cloud or on-premises AI is the right architecture for an ArgusIQ deployment, the decision tree is short:

Does the operational data need to stay inside the facility network?

Yes → ArgusAI on-premises
No → Cloud Ask Argus (external LLM API)

Is the OT network air-gapped or isolated from external connectivity?

Yes → ArgusAI on-premises
No → Either architecture works; cost and preference determine the choice

Is the deployment in a classified, ITAR-controlled, or regulated environment?

Yes → ArgusAI on-premises (compliance requirement often dictates architecture)
No → Either architecture works

Most ArgusIQ cloud deployments — commercial manufacturers, retailers, municipalities, agriculture operations — don’t have data residency requirements that preclude cloud AI. They use cloud Ask Argus inference. It works well and requires no additional hardware investment.

ArgusAI exists for the deployments where cloud inference isn’t an option — and for the deployments where customers prefer to keep all AI processing inside their own infrastructure, regardless of whether they’re required to.

What Building ArgusAI Changed About the Platform

ArgusAI wasn’t just an add-on deployment option. Building it changed how we thought about the full ArgusIQ architecture.

The MCP server pattern — domain-specific context retrievers, structured data access, RBAC enforcement at the data layer — became cleaner and more explicitly documented as part of the ArgusAI development. The separation between the inference layer and the data access layer, already present in the cloud architecture, became a formal architectural boundary with documented interfaces.

This matters for extensibility. Organizations with data systems outside ArgusIQ — ERP systems, external document repositories, proprietary production management systems — can extend ArgusAI’s capabilities by developing custom MCP servers. The same interface contract that governs the ArgusIQ-native MCP servers governs custom ones. The LLM doesn’t know or care whether a context response came from a Viaanix-built MCP server or a custom one.

The Honest Reason

Why did we build ArgusAI? The honest answer is that we had customers who needed it and weren’t willing to accept “that’s not how AI works” as an answer.

The common wisdom in 2024 was that on-premises AI was a niche use case — too expensive, too complex, performance too limited. That wisdom came from a world where the frontier models were cloud-only and the open-weight alternatives lagged significantly.

Our customers in defense, utilities, and pharmaceutical manufacturing couldn’t deploy cloud AI. They had real operational problems that Ask Argus could solve. We chose to solve those problems rather than tell those customers their security requirements were incompatible with modern AI.

ArgusAI is the result. It works. The deployments are operational. The customers are asking operational intelligence questions and getting answers, and their data is staying exactly where it needs to stay.

Talk to our team about ArgusAI deployment for your facility.

Ready to see how this applies to your operations?

Every article describes real capabilities you can deploy today.

Talk to Us Browse More Articles