Blog | APR 14, 2025

Building Trustworthy ML and AI: A Reference Architecture for IoT / OT Data Provenance

EU AI ActNIST

Building trustworthy AI and ML systems in industrial environments starts with securing the data foundation. This blog outlines how organizations can architect AI on trustworthy, verifiable IoT/OT data, enabling regulatory compliance under the EU AI Act and NIST AI RMF, and supporting resilient decision-making from the ground up.

Patrick Lamplmair

CTO, Tributech

~ 6 min

Introduction

As Artificial Intelligence (AI) systems are increasingly deployed in critical sectors ranging from manufacturing to energy and mobility, the integrity of their training and operational data becomes essential. Recent advancements in AI legislation such as the EU AI Act, and U.S. frameworks like NIST AI RMF 1.0 (100-1) and AI LRM (600-1), highlight one recurring theme: trustworthy AI begins with trustworthy data.

At Tributech, we help organizations navigate this challenge by providing a secure, cryptographically verifiable data foundation purpose-built for AI and Machine Learning (ML) in Internet of Things (IoT) and Operational Technology (OT) environments. Our zero-trust middleware ensures integrity from the edge to the cloud, helping customers mitigate data poisoning risks, maintain traceability, and comply with evolving regulatory requirements.

What the EU AI Act and NIST Frameworks Mean for Trustworthy AI

As AI becomes embedded in critical infrastructure, industrial control systems, and other high-impact environments, regulations are emerging to ensure these systems are safe, transparent, and accountable. A core concern across all frameworks is the need for trustworthy data and traceable ML processes, especially when working with sensitive, real-time, or distributed IoT/OT data. Regulatory frameworks like the EU AI Act, NIST AI RMF 1.0, and NIST AI 600-1 define requirements that organizations must consider across the full AI lifecycle, from data acquisition to model updates.

EU AI Act (2024/1689)

The EU AI Act is the world’s first binding regulation for artificial intelligence, applying to any system deemed “high-risk,” including those used in industrial automation, mobility, energy, and manufacturing. It sets mandatory requirements for data governance, robustness, human oversight, and lifecycle documentation. For systems built on sensor or machine data, the Act’s focus on data quality, integrity and traceability directly impacts architecture and operational workflows.

Article 10 – Data governance: Training, validation, and test data must be representative, complete, and error-free, with documented data sourcing, annotation, and preprocessing methods.
Article 10 – Data integrity verification: Processes must ensure input data quality and support traceability throughout development and deployment.
Article 15 – Robustness against manipulation: Systems must be resilient to data poisoning and adversarial attacks, with mechanisms to detect and respond to integrity issues.
Article 14 – Human oversight: Critical AI decisions must be explainable and overrideable, ensuring humans remain in control in high-risk scenarios.

US NIST AI RMF 1.0 (AI 100-1)

The NIST AI Risk Management Framework provides a structured approach to managing AI risks across development and deployment. While voluntary, it is increasingly adopted as a benchmark for responsible AI in both public and private sectors. It promotes risk-based design decisions, accountability structures, and documentation aligned with trust and safety goals.

End-to-end risk mapping: AI workflows must be mapped across data, model, and operational contexts to identify potential failure points or security threats.
Validated inputs and outputs: Data integrity, consistency, and model behavior must be regularly assessed, especially in dynamic or real-time environments.
Operational transparency: Traceability and documentation must support auditing and explainability, enabling organizations to demonstrate responsible deployment.
Governance and accountability: Roles and responsibilities must be defined to manage AI risks across internal teams and external partners.

US NIST AI Lifecycle Risk Management (AI 600-1)

This NIST AI Framework builds on AI 100-1 by offering detailed implementation guidance across all phases of an AI system's life. It is particularly relevant for systems that evolve over time, are retrained frequently, or are deployed across decentralized environments like edge computing. It reinforces the importance of continuous trust assurance and structured risk mitigation.

Lifecycle-wide controls: Risk mitigation must extend across design, training, deployment, and retirement, with defined checkpoints for verification and validation.
Retraining governance: New training data must be verified and models must be version-controlled, with signed artifacts to ensure authenticity.
Update integrity: Changes to models must be documented, traceable, and auditable, preventing drift or regressions in safety and performance.

Reference Architecture for Trustworthy ML/AI Based on IoT and OT Data

To meet the regulatory and operational demands outlined in the EU AI Act and NIST frameworks, organizations need a clear architectural foundation that embeds trust, security, and traceability into every phase of the AI lifecycle. This is especially critical for systems that rely on industrial data, such as sensor readings, machine logs, or real-time control signals, where data integrity issues can directly translate into safety, compliance, or business continuity risks.

The architecture presented here is a reference model for building trustworthy ML/AI systems grounded in verifiable IoT/OT data, leveraging Tributech’s middleware as the foundation for cryptographic data notarization, context modeling via digital twin stack, and secure data integration. It’s designed to support both cloud-based and edge-deployed AI, with flexibility to integrate additional components (e.g., MLOps pipelines, inference engines, model registries) depending on specific use cases.

This architecture organizes trust across three core lifecycle phases:

a) Initial Model Development

Trust in AI starts with trustworthy training data. During model development, raw IoT/OT data is first captured via Tributech's middleware, where it is notarized at source. This ensures that the data used for training cannot be manipulated without detection once it has been captured - a fundamental measure against data poisoning (EU AI Act Article 15).

Each data set is linked to contextual metadata such as source device, location and timestamp, enabling validation for relevance and completeness (EU AI Act Article 10). The training workflow is fully documented - model configurations and data versions are recorded to ensure traceability and reproducibility. Once completed, the models are cryptographically signed, enabling integrity checks for future deployments.

b) Model Deployment and Operation (Edge, Cloud, or Hybrid)

When deploying models into production, whether in the cloud, at the edge, or in a hybrid setup, the architecture ensures that only authorized, signed models are executed. Before inference, the model signature is verified to prevent unauthorized alterations (EU AI Act Article 15 on robustness).

At runtime, all operational data (inputs, outputs, and metadata) is notarized, ensuring that model decisions are based on verifiable, untampered inputs. This protects both the integrity of the inference process and downstream decisions, particularly in critical environments like energy systems or autonomous control loops.

For critical and safety-critical applications, rule-based or human-driven “decision gates” are integrated to maintain control and accountability (EU AI Act Article 14, human oversight). These gates act as explicit checkpoints between the output of the AI model and any downstream system actions. For example, if a model recommends a process parameter change or a maintenance shutdown, an automated control system can first evaluate predefined rules (e.g. thresholds, risk classifications, operational plans) before allowing execution. In higher risk scenarios, the system may instead require a human operator to manually review the decision context and confirm or override the action to ensure that safety, regulatory or ethical considerations are not circumvented. Every decision interaction - whether automated or reviewed by a human - is logged to ensure traceability and auditability.

c) Model Maintenance and Continuous Improvement

As new data becomes available, the architecture enables secure and trustworthy retraining. Updated training datasets undergo the same notarization and contextual validation as in the initial development phase. The MLOps pipeline records all changes, and updated models are re-signed to maintain chain-of-trust continuity.

This approach supports lifecycle-wide governance, as required by both NIST AI 600-1 and the EU AI Act, ensuring that model evolution does not compromise compliance or reliability. Each update can be traced back to its origin, enabling full auditability and operational transparency over time.

Provenance First - Building Trustworthy AI from the Ground Up

Before discussing model risk, transparency, or algorithmic fairness, there’s one prerequisite that underpins it all: verifiable data provenance. Without knowing where data comes from, how it was generated, and whether it has been altered, no AI system, no matter how advanced, can be fully trusted. This challenge becomes even more complex in IoT and OT environments, where data originates from a diverse, often distributed set of physical assets: machines, sensors, energy systems, and embedded controllers. That’s why data provenance must be the first architectural decision when designing trustworthy ML/AI systems. At Tributech, we help organizations solve this challenge by providing:

A data middleware platform purpose-built for IoT/OT environments, enabling secure, contextualized, and notarized data streams from edge to cloud.
A flexible foundation that supports cryptographic integrity via data notarization, context modeling via DTDL-based digital twin stack, and traceability, critical capabilities for building AI that is not only functional, but also explainable, resilient, and trustworthy by design.
Proven expertise in regulatory frameworks and their technical implications for secure data platforms.

With the right data foundation in place, everything else - model validation, inference control, lifecycle governance - becomes achievable. Without it, even the most sophisticated AI architecture rests on uncertain ground.

Interested in building a trustworthy ML / AI stack on verifiable IoT and OT data?

Let’s talk about how Tributech can help you establish a secure and future-proof data platform for AI. Leave us your details and we'll help you figure out the right approach.