Machine Learning: to the cloud and back, securely

Confidential Computing presented at the Linux Foundation Open Source Summit, Dublin, 2022.
14 September 2022

Attendee at the Linux Foundation’s Open-Source Conference, 2022. Source: Linux Foundation

Any mention of machine learning in the context of sensitive data sets causes alarm bells to ring in multiple departments in the average organization. Compliance and data governance overseers have their two-cents-worth to say, as do PR teams worried about the fallout from data leaks. And IT & Data professionals are left to figure out a) how to use machine learning to bring value to the business and b) how to give any ML model data to learn from and eventually process while balancing privacy (anonymization of medical data, for instance) and the quality of inferred results.

The answer has usually been to throw money at the problem and invest heavily in on-premise hardware capable of churning numbers quickly enough. Or, only burst to an AIaaS cloud with non-sensitive data – a solution which in itself brings resource challenges.

However, there may be a solution to this thorny problem. At a talk given this week at The Linux Foundation’s Open Source Summit Europe, Daniel Huynh, CEO of Mithril Security, showcased BlindAI, a method by which ML can work on data sets in a highly secure sandbox. One big advantage for organizations with data models that use sensitive data is that they can leverage external cloud compute without compromising data security. In short, information can leave the building, be ingested elsewhere, and travel back – all without prying eyes being able to see a single, unencrypted byte. And those prying eyes include any miscreant operating inside the cloud provider hosting the necessary processing grunt.

Source: Mithril Security

In the BlindAI cloud, machine learning deployments run in secure enclaves using isolated memory. A hash of the code can be verified remotely to ensure it’s not been compromised and can be trusted. There is also hardware-based attestation of the VM via Intel (SGX), AMD (SEV), and AWS Nitro (NVIDIA attestation is in the pipeline, too).

As you might imagine from the context in which the presentation was made (the Linux Foundation’s Open Source Summit), all code is available on GitHub, so it can be vetted by any parties interested in deploying the solution.

While many companies use remote AI compute and storage for heavier workloads and encrypt to and from third-party clouds, data has to be processed in the clear. By isolating the machine learning algorithms in an enclave, the intellectual property that is the AI code is protected, as is the data that produces inferences. That removes the risk of a malicious party accessing either element via the cloud entity.

BlindAI offers end-to-end encryption that over-delivers adherence to governance and data protection strictures for applications such as secure speech processing or medical analysis. There are also possibilities for AI-as-a-service vendors to add highly secure tiers to their offerings, with total isolation of models effectively guaranteed.

Daniel also spoke about BastionAI, which is currently in development. BastionAI is a zero-trust data training platform that will allow, for instance, multiple datasets to be ingested and processed concurrently with complete separation. For organizations that need different stakeholders’ data to be ring-fenced (AI service resellers, for example), this solution may prove to be a route to decent economies of scale.