COMMENT
The days of large, monolithic apps are fading. Today’s applications rely on microservices and code reuse, which simplifies development but creates complexity when it comes to monitoring and managing the components used.
This is why the software bill of materials (SBOM) has emerged as an indispensable tool for identifying what’s in a software app, including the components, versions, and dependencies that reside within the systems. SBOMs also provide insights into dependencies, vulnerabilities, and risks that impact cybersecurity.
An SBOM allows CISOs and other business leaders to focus on what really matters by providing an up-to-date inventory of software components. This makes it easier to establish and enforce strong governance and spot potential problems before they spiral out of control.
However, in the era of artificial intelligence (AI), the classical SBOM has some limitations. Emerging machine learning (ML) frameworks introduce significant opportunities, but they also push the boundaries of risk and introduce a new asset to organizations: the machine learning model. Without strong oversight and control over these models, a number of practical, technical and legal problems can arise.
This is where machine learning bills of materials (MLBOM) come into play. The framework tracks names, locations, versions, and licenses for assets that comprise an ML model. It also includes general information about the nature of the model, training configurations embedded in the metadata, who owns it, various feature sets, hardware requirements, and more.
Why MLBOMs matter
CISOs are realizing that AI and machine learning require a different security model, and that the underlying training data and the models that drive it are often not tracked or regulated. An MLBOM can help an organization avoid security risks and failures. It addresses critical factors such as model and data provenance, security assessments, and dynamic changes that are beyond the scope of the SBOM.
Because ML environments are in a constant state of flux and changes can occur with little or no human interaction, issues related to the consistency of the data, including where it came from, how it was cleaned, and how it was labeled , are a constant concern.
For example, if a business analyst or data scientist determines that a data set is poisoned, MLBOM simplifies the task of finding all the various points of contact and models trained with that data.
MLBOMs can increase protection
Transparency, auditability, auditing and forensic analysis are all hallmarks of an MLBOM. With a complete view of the “ingredients” that make up an ML model, an organization is equipped to manage its ML models securely.
Here are some ways to create a best practice framework around an MLBOM:
-
Recognize the need for an MLBOM: It’s no secret that machine learning drives innovation and even business disruption. However, it also introduces significant risks that can extend to reputation, regulatory compliance and legal issues. Having visibility into ML models is critically important.
-
Conduct essential due diligence: An MLBOM should integrate with the CI/CD pipeline and provide a high level of clarity. Support for standard frameworks like JSON or OWASP CycloneDX can unify the SBOM and MLBOM processes.
-
Analyze policies, processes and governance: It is essential to synchronize an MLBOM with an organization’s workflows and business processes. This increases the likelihood that ML pipelines will work as intended, while minimizing risks related to cybersecurity, data privacy, compliance, and other areas associated with risk.
-
Use an MLBOM with machine learning ports: Rigorous controls and gateways lead to essential guardrails for AI and machine learning. This way, the business and CSO can build on successes and leverage machine learning to drive greater cost savings, performance improvements, and business value.
Machine learning is dramatically changing the business and IT landscape. By extending proven SBOM methodologies to ML through MLBOM, you can take a giant step towards boosting machine learning performance and protecting your data and assets.