Python’s PyPI reveals its secrets

April 11, 2024News about hackersSoftware security/programming

PyPI Secrets

GitGuardian is famous for its annual State of Secrets Sprawl report. In their 2023 report, they found over 10 million passwords, API keys, and other credentials exposed in public GitHub commits. The findings of their 2024 report didn’t just highlight 12.8 million new secrets exposed in GitHub, but a number in the popular Python package repository PyPI.

PyPI, short for Python Package Index, hosts over 20 terabytes of files freely available for use in Python projects. If you have ever typed pip install [name of package], he probably pulled that package from PyPI. Many people use it too. Whether GitHub, PyPI, or others, the report states, “open source packages make up about 90% of the code running in production today.It’s easy to see why these packages help developers avoid reinventing millions of wheels every day.

In its 2024 report, GitGuardian reported finding over 11,000 exposed unique secrets, of which 1,000 were added to PyPI in 2023. That’s not much compared to the 12.8 million new secrets added to GitHub in 2023, but GitHub is orders of magnitude larger.

Even more worrying, of the secrets introduced in 2017, nearly 100 were still valid 6-7 years later. They did not have the ability to verify the validity of all the secrets. However, over 300 unique and valid secrets have been discovered. While this is mildly alarming to the casual observer and not necessarily a threat to casual Python developers (unlike the 116 malicious packages reported by ESET as of late 2023), it is a threat of unknown magnitude to the owners of such packages .

While GitGuardian has hundreds of secret detectors, which it has developed and refined over the years, some of the most common secrets detected in its overall 2023 study were OpenAI API keys, Google API keys, and Google Cloud keys. It is not difficult for a competent programmer to write a regular expression to find a single common secret format. And even if a lot of false positives turn up, automating the checks to determine whether they were valid could help the developer find a little treasure trove of exploitable secrets.

It now stands to reason that if a key has been published to a public repository like GitHub or PyPI, it should be considered compromised. In tests, the validity of honeytokens (a sort of “defanged” API key without access to any resources) was tested by bots within a minute of posting on GitHub. Honeytokens, in fact, act as a “canary” for a growing number of developers. Depending on where you placed a specific honeytoken, you can see that someone has been snooping there and get some information about them based on telemetry data collected when the honeytoken is used.

The biggest concern when accidentally publishing a secret isn’t just that an attacker could run up your cloud bill. That’s where they can go from there. If an AWS IAM token with excessive permissions were leaked, what might the malicious actor find in the S3 buckets or databases to which it grants access? Could that malicious actor access more source code and corrupt something that will be delivered to many others?

Whether you’re committing secrets to GitHub, PyPI, NPM, or any public collection of source code, the best first step when you discover a secret has been leaked is to revoke it. Remember that small window between publishing and exploitation for a honeytoken. Once a secret has been published, it has likely been copied. Even if you have not detected unauthorized use, you duty let’s assume that someone unauthorized and malicious now has it.

Even if the source code is located in a private repository, stories abound of malicious actors gaining access to private repositories through social engineering, phishing, and, of course, leaked secrets. If there’s a lesson from all this, it’s that plain text secrets in source code are found sooner or later. Whether they are accidentally posted in public or found by someone with access they shouldn’t have, they get found.

In summary, wherever you store or publish your source code, be it a private repository or a public registry, you should follow a few simple rules:

  1. Don’t store secrets in plain text in your source code.
  2. Prevent those who learn of a secret from participating in an expedition by keeping the privileges granted by such secrets strictly limited.
  3. If you discover that you have disclosed a secret, reverse it. You may need to take some time to ensure your production systems have the new undisclosed secret for business continuity, but revoke it as soon as possible.
  4. Implement automations like those offered by GitGuardian to ensure you aren’t relying on flawed humans to perfectly observe secrets management best practices.

If you follow them, you may not have to learn the lessons that 11,000 secret owners have probably learned the hard way by publishing them on PyPI.

Did you find this article interesting? This article is contributed by one of our valued partners. Follow us on Twitter and LinkedIn to read the most exclusive content we publish.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *