Researchers have found that one of the most popular source code repositories in the world is still housing thousands of publicly accessible encryption keys.
Over 100,000 code repositories on source code management site GitHub contain secret access keys that can give attackers privileged access to those repositories (repos) or to online service providers’ services.
Researchers at North Carolina State University (NCSU) scanned almost 13% of GitHub’s public repositories over nearly six months. In a paper revealing the findings, they said:
We find that not only is secret leakage pervasive – affecting over 100,000 repositories – but that thousands of new, unique secrets are leaked every day.
The credentials that developers routinely publish on their GitHub repos fall into several categories. These include SSH keys, which are digital certificates that automatically unlock online resources. Another is application programming interface (API) keys (also known as tokens). These are digital keys that enable developers to access online services ranging from Twitter to Google Search directly from their programs. The researchers found a mixture of these keys for services including Google, Twitter, Amazon Web Services, Facebook, MailChimp, online telephony service Twilio, and credit card processing companies Stripe, Square, and Braintree.
These leaks sometimes compromised high-value targets. The researchers found Amazon Web Service (AWS) credentials for a large website serving millions of US college applicants. They also found AWS credentials for the website of a major government agency in a Western European country.
How does it happen?
Developers sometimes get careless when updating the code on their machines and then sending it to GitHub, which they typically do using command line instructions known as commits and pushes.
Coders will sometimes store SSH keys and API keys in the same directories as their source code, so that they get caught up in the commit and push process. It’s an easy mistake to make with SSH keys, which developers often generate from the command line. Some other mishaps are even more facepalm-worthy, such as embedding API keys directly in source code.
One way of preventing private keys from being committed is to tell a
.gitignore file where they are. This is a file that blocks certain information from being uploaded to a GitHub repo. Instead, some developers stored their secrets directly in the
.gitignore file, meaning that it got included in their repos.
Some online services like OAuth require multiple secrets for access, such as a digital key and an ID. That didn’t provide much extra security in this case though, because four in five of the repos holding these secrets contained the other information required to access the third-party service as well.
Many developers did nothing when notified of the problem, according to the paper. Those that tried to fix the problem tended to create new commits for their repos that removed the secrets. This doesn’t work, because GitHub is a front end for Git, a version control system that purposely stores information held in past commits so you can keep track of what changed, and when.
What devs really need to do is either rewrite their history to remove the offending commit, or delete the entire repo and start again without storing the password, said the researchers. Most people did neither.
How did the researchers find these keys? Was it via some nefarious hack or loophole in the website? Nope – they just searched for it. GitHub has a search API that can be used to search across all its repos, and it happily delivers the secret key data.
Paper co-author Brad Reaves told us:
While we used the Search API, which requires an API key that can be obtained for free by any GitHub user, keys can also be found with the online search function.
This has been a problem since at least 2013, when GitHub shut down its search service for a while after finding secret keys turning up in searches. He added:
After this was publicized, GitHub took down the Code Search tool, claiming unrelated reasons, but shortly relaunched the tool with the same functionality.
So is all of this GitHub’s fault? Hardly. As Reaves pointed out:
Code search is a great tool, but it would be very difficult for GitHub to build a tool that censored all possible secrets; the burden is on developers not to post secrets to public repositories.
To its credit GitHub, which Microsoft acquired for $7.5bn in October 2018, is trying to make things better. It introduced rate limits for its search tool, although the paper points out that an attacker could overcome this by searching through multiple accounts. It has also been scanning repositories for several years to find GitHub OAuth tokens and personal access tokens, which can be used to access peoples’ GitHub repositories.
In October 2018, GitHub also announced partnerships with third-party online services as part of a new feature called Token Scanning. This scans new commits or private-turned-public repos for service providers’ API keys and notifies the appropriate service provider when it finds them. That service provider may then choose to revoke the credentials, which is the step GitHub recommends, according to a spokesperson there. She also told us that it has shared information on more than 100 million compromised tokens so far.
It’s a start, said Reaves, but GitHub’s work can only solve the problem up to a point:
I think efforts like GitHub’s Token Scanning project should be applauded, but they are only effective once a leak has already occurred. This problem also is likely not isolated to GitHub – it will affect any publicly available code. We need more research to develop systems that help developers avoid this mistake in the first place.
Kudos to GitHub for trying its best to solve the problem, but it’s up to developers to use services like this – and the associated tools like Git – properly.
11 comments on “Thousands of API and cryptographic keys leaking on GitHub every day”
“GitHub is a version control system” should probably be “Git is a version control system”.
My first use of pushing to GitHub included a weather API key. Just learning GitHub, it took me forever to remove the reference to the key as my account was a free public one. Ultimately, I launched Gitlab on a local server and use that instead of GitHub.
I’ve also published low value keys without worrying much about it. Even now I have a password pushed to github in plaintext, but because it’s a throwaway password to a temporary account for a local-host only database containing junk data, I’m really just not worried about it.
But why, when you simply don’t need to? The problem with training yourself to cut corners when you can get away with it is that it means you never fully eliminate bad habits. It’s like HTTPS – if you get in the habit of never running a web server without it, you never end up with a web server that isn’t doing TLS. For the greater good of all, and because it sets a good example to the next generation of coders.
It may be like HTTPS in that always working with it significantly helps to prevent accidentally working without it, however in my case, I’m attempting to learn how to use the framework as well as programmatically connect to the database while using a language I’m already not strong in. This means that to get it working, my best bet is to stick close to the documentation, which seems to have two places the password goes into. (I acknowledge that this is strange, and will issue a fix for this later). Because the framework changes where my import links point to, I’ve also had to learn how to tell it which file I’m attempting to reference.
Attempting to debug database authentication, database connection, module loading, page routing, and webpage loading all at once is a lot to bite off.
OK, but why share your keys with the whole world?
Sometimes the resources that the keys pointed to are long gone, or require a new ssh key, so there’s no reason to worry about removing them. I suspect this is why developers took no action when notified. The more concerning cases are the ones where they attempted to fix it and failed to do so.
Developers putting their secrets directly in the gitignore. *facepalm*
I’d love for GitHub to take this seriously by using their own functionality of pre-receive hooks on .com. They could stop secrets being even served to GitHub.
The worst part in all this?
GitHub will not remove these credentials if notified of them. They will simply, after several days of delay, ask the repository to remove them. They give the repository anywhere from two weeks to a month to comply before taking further action which is limited to disabling access to the individual repository.
Want GitHub to change it’s policy there? Start posing Github account passwords to low-activity repositories over and over again. Once a few thousand accounts go from secretly to widely publicly compromised because nobody can reach the repository owners to remove the passwords maybe Github will adopt a sane approach to sensitive data removal.
Someone ought to just sue them. There has to be an equivalent of the DMCA for passwords.