curl at your own risk: PyPi security incident

13 October 2023

“DANGER – UNEXPLODED BOMB” by Leo Reynolds is licensed under CC BY-NC-SA 2.0.

  • An SBoM may make development safer.
  • PyPi repo infected for first time.
  • Telegram, Alibaba and AWS code affected.

Developers pulling down application components for their Python-based projects that involve well-known applications and services have been hit by malicious code embedded in their downloads, Checkmarx has revealed.

Malicious actors used several well-known spoofing techniques to fool software developers into downloading software for use in their code. While some organizations and individuals use automatic scanning to prevent just such an occurrence, the bad actors had embedded their code deeply into subroutines of libraries that would only be activated when called by other elements rather than on the first run of the main code base. This was enough, in many cases, to bypass basic malware scanning algorithms.

The hackers’ choice of projects to infect included routines and libraries around products such as Telegram, Alibaba cloud services (Aliyun), and AWS. Under the pseudonym “kohlersbtuh15”, an individual or group uploaded several malicious packages to the PyPi repository and used typosquatting and starjacking techniques to fool developers into the rogue downloads.

Vetting and controlling downloads of code at the point when they’re needed is a tough call, but after-the-fact security measures such as a catalog of approved sources and the formulation of an SBoM for every release may help catch unwanted hangers-on to otherwise clean code.

How the repo was compromised

Typosquatting, in this case, involves “near-as-dammit” spellings of genuine packages, which may be chosen by the unwary or distracted developer in place of the original, benign code sets. Starjacking involves linking a rogue upload to an unrelated yet highly popular GitHub repository to present the malicious package as popular and trusted. Package managers using PyPi don’t check on the veracity of a locally-identified package against that on GitHub. For many developers, GitHub stars are enough to recommend a download, as the ranking system is essentially an accepted form of peer review. Without a direct, letter-for-letter check of the repo name from PyPi against the corresponding GitHub repo, most users would be none the wiser.

The attackers, in this case, took the trouble to first replicate the code they were impersonating and then inserted additions into functions that may not have been called immediately. That left users unaware that they were working with corrupted elements in their applications.

Checkmarx discovered the corrupted PyPi repositories

Similar events have taken place in the recent past at other common repos, including NPM, where downloads of JavaScript elements number in the millions per day. Disgruntled, financially poor, or mischievous developers have ‘spiked’ their own code (probably to the detriment of their careers and online profiles), plus there has been a number of pure malicious activity where repositories are deliberately infected.

The nature of modern software development is such that there is a huge reliance on the use of ready-made code, frameworks, and libraries, usually pulled from well-known public repositories. That leaves developers open to the possibility of unknowingly embedding malware into finished products. Many larger development organizations use their own vetted repos from which their staff can pull pre-approved software and either block downloads from public repos or ensure that downloads are scanned before being made available to the end developer. For the majority, however, care and caution are the only things standing between developers and applications that are hiding unpleasant surprises.

SBoMs as a partial answer

In development circles, calls for SBoMs (software bill of materials) are getting louder. Assembling such documents and the facilities to keep them up to date in environments where speed and agility are valued is always going to be seen as a time drain. As well as the significant body of work that assembling an SBoM entails, keeping it up to date as the end application is updated is another overhead that is rarely catered for. Large applications may be based on many thousands of individual components drawn from hundreds of repositories. A single point upgrade to a little-used library means amending the SBoM, plus the necessary checks on the provenance of new versions and their dependencies.

Creating an SBoM may seem to be a job of work, but done well, it’s an investment that will likely pay off.

Incidents like the ones at the PyPi and NPM repositories also bring legal implications as well as potential losses from cyber breaches, the cost of which climbs inexorably. When lawyers looking for individuals or organizations to blame for their client’s financial or reputational losses begin to circle, vulture-like, over the software chain, where they might best land is a source of debate. Possible targets for litigious wrath might include the repository that unknowingly advertised infected code elements, the developer who personally pulled the infected code, the vendor or distributor of a finished application, or the cybersecurity personnel that let an application go into use in the end-user’s organization complete with hidden malware.

There may be reasons behind the malware authors’ choice of code to infect, in this case, but there seems little pattern to the choice of Telegram, Alibaba, and AWS other than the fact that all are well-used services. For bad actors, spreading their unpleasantness as widely as possible is usually a genuine aim.

Businesses that produce software should invest in prophylactic security measures and grant their teams the time and resources to generate and maintain comprehensive SBoMs. Development environments that stress speed to production over security concerns will likely have been hardest hit. If we put aside any schadenfreude over those organizations with that approach, we realize that we are all affected, either as direct victims of this incident or are the likely victims of future, similar attacks.