The State of Secrets Sprawl – Podcast

Sponsored by:

May 9, 2022 6:43 am

In this podcast, we dive into the 2022 edition of the State of Secrets Sprawl report with Mackenzie Jackson, developer advocate at GitGuardian. We talk issues that corporations face with public leaks from groups like Lapsus and more, as well as ways for developers to keep their code safe.

Can I tell you a secret? Will you keep it between us? You’ve probably said this or heard this when it comes to friends and family. However, do you also know that secret keeping, or lack thereof is one of the biggest issues that businesses face?

In the latest in our Threatpost Podcast Series, host Becky Bracken discusses what secrets are and how they are lurking in source code. Mackenzie Jackson, developer advocate at GitGuardian, explains issues caused by public leaks and how to keep your systems safe.

The recent State of Secrets Sprawl 2021 report from GitGuardian further defines the breadth of business secret risks.

(Brought to you by SpecOps. Underwriters of Threatpost podcasts do not assert any editorial control over content.)

A secret can be any sensitive data that we want to keep private. When discussing secrets in the context of software development, secrets generally refer to digital authentication credentials that grant access to services, systems and data. These are most commonly API keys, usernames and passwords, or security certificates.

“Secrets are what tie together different building blocks of a single application by creating a secure connection between each component. Secrets grant access to the most sensitive
systems.”

For the full report, click here (ungated).

Please listen below, and a lightly edited transcript follows.

For additional executive insights, check out the Threatpost podcast microsite.

Becky Bracken: Hello, my name is Becky Bracken. And we are thrilled to welcome you today to our latest edition of Threatpost podcast series. I am thrilled to welcome Mackenzie Jackson, who is a developer advocate with GitGuardian. And he is here to talk to us today about secret sprawl, corporate secrets that are lurking in source code. Is that a good way to explain it, Mackenzie?

Mackenzie Jackson: Yeah, that’s perfect. I like the way you said lurking. It makes it seem quite suspicious, which is definitely kind of sneaky. That’s how we need to think about our secrets and that is good.

BB: So let’s talk, and let’s define what a secret would be that’s lurking in source code. What are some of the sensitive details that are out there that companies might not be aware of?

MJ: Yeah, of course. So, a secret can really be defined as a digital authentication credential, which is just a fancy way of saying your API Key, security certificate, credential peers, it’s really anything that provides and grants access to external services, infrastructure and data.

So we can think about our secrets as the crown jewels of our organization, because they authenticate, you know, systems and users into different areas. It’s really important that we make sure that these are tightly secured, wrapped in lots of authentication layers, because if a malicious actor gets their hands on them, obviously, they can penetrate into our internal systems, they can elevate their privileges, and they can do all kinds of malicious activity. Once they’re properly authenticated.

BB:: GitGuardian, you all have just released this incredibly detailed report on state of Secrets Sprawl. So let’s and for anyone out there who is interested in more information on this GitGuardian can provide this report for you to be downloaded. There’s a wealth of information in there. But, let’s talk a little bit about where these repositories are. I know that an obvious one everyone thinks of as GitHub, but the report really outlines that’s not even necessarily one of the most dangerous places. Correct?

MJ:: Yes, that’s right. We look at in our state of secrets, full report, we scan a lot of different areas for secrets. We talked about GitHub. We scan every single public commit that’s made on GitHub, about a billion commits, a year that we scan, and that always provides huge amounts of information.

And that’s one of the worst places that a secret can end up. But if we’re talking about threats to organizations, then we also need to factor in different areas that aren’t so obvious. A private internal Git repositories become huge and high value targets for attackers, because they’re known to have a lot of secrets. Other areas include Docker images and other containers.

Because when we build these, if we build them from source code, if the source code has secrets in the Docker image, or the container is going to have secrets in it, so public, GitHub is the one that everyone thinks about. And there’s plenty that we can talk to on that. But then there’s also these, all these other areas that organizations need to consider because it’s increasingly becoming a really detailed target for attackers.

BB:: Let’s talk about how attackers view these. And I also wanted to call out, one of the surprises I saw was cloud services, like AWS can also hold a lot of these secrets. So how are attackers approaching this? And what are they seeing as the potential opportunities in these repositories?

MJ:: Attackers are really clever at discovering these credentials. Credential harvesting is, is a really strong technique that they’re using, especially to gain initial access. If we look at some of the recent breaches that have happened we can take the case of Uber, which was fairly recently. They had an employee accidentally published some corporate code on a public GitHub repository. The public GitHub repository wasn’t owned by Uber, it was owned by the employee themselves. The developer accidentally pushed some source code to it, and it contained some cloud service provider keys granting access to Uber. Then they had a breach following on from that. In this case, you can see that an attacker has really established a perimeter around Uber and figured out who’s working for Uber, who are the developers in there, and then started monitoring those personal repositories for leaked credentials.

We see this we see this quite a lot. We know that SolarWinds, although it’s not officially recognized, as the initial point of access. We know that the access to a SolarWinds Server was leaked by an employee and their personal public GitHub repository. So again, we can see malicious actors that are monitoring the perimeter around companies. And it’s particularly scary in his public domain, because the organization has no control over it, you can’t control what your employees do. And in most cases, it’s an accident.

But you have no governing control over what your employees personal repository is that. And then if we look at, you know, some other technologies, we can look at Codecov, which is a code coverage tool, they had a massive supply chain attack, that all stemmed from the fact that they had a credential that get granted access to their source code control system that was leaked in a public docker image. So again, we can see adversaries, pulling down public information like Docker images, pulling them apart, scanning them for secrets. And if there’s an expose secrets, then using that to then launch malicious activity in the case of Codecov, it was they injected malicious code and caused all kinds of havoc.

And then the last point I’ll make is that, you know, recently Lapsus is on everyone’s mouth at the moment. We’ve heard lots about them. And they’ve been leaking a lot of internal source code. And this is another area where we scanned Samsung’s source code and found around about 6,000 credentials inside the source code that Lapsus leaked. So here we have another area where we can see why adversaries are targeting internal repositories, because even Samsung, which is going to be more security aware than most companies still have huge amounts of secrets.

What we saw in the Samsung source code repository was actually well above the industry standard. They had less secrets than what would expect in a comparative company. I don’t want say the secrets issue is still a whole mess. I’m not bashing Samsung, or anyone else. But you know, this is just a huge problem. And you can really see the different techniques that adversaries use to gain these credentials, and why they’re targeting specific technologies.

BB:: Now, a common refrain I’ve noticed in my reporting throughout the cybersecurity community, is this increasing collaboration between developers and security. And I know that GitGuardian has has a specific play in that area. So let’s talk about how you can, everybody would like that to happen. So let’s talk about how you all can sort of bridge that gap a little bit?

MJ:: For sure. So yeah, like, it’s given a lot of names do shift left or dev SEC ops. But you know, essentially, as our other buzzwords, I’m sure someone will create, you know, but essentially, it’s about, you know, everyone having a shared responsibility, in security. And, I mean, like, I’m an advocate for developers, you know, and I work for a security company. And I’m always advocating that there is a shared responsibility that developers need to to take part be active in that security, but they don’t need to become security engineers, they need to focus on what they have control of, and what they can do.

So at GitGuardian, we we scan, internal systems, git repositories, Docker images, for secrets. And we have a tool that’s used by the security team that gives them that bird’s eye view of what’s happening, right, if secrets are being leaked where they are. And then they can, they can, they can remediate them.

But we also have a bunch of free tools for developers, that enable them to kind of be part of that solution. So they can use a product that we have called ggshield to install a pre commit or pre push hook, which will block commits into version control systems, if they contain secrets. So you know, we’re not disrupting their job, it fits into the current lifecycle, we’re not adding anything else to do. But the end result is that the security engineers have less of these incidents to deal with, when we implement that.
So it’s about having that shared responsibility where, okay, I don’t need to know everything about security, but I’m dealing with source code, and then I’ll take on some of that shared responsibility to make sure that my source code doesn’t contain secrets. Right.

BB:: What am and again, I want to point our listeners to GitGuardians state of the secret sprawl for 2021. There’s a lot of really more granular information in there. But I did want to call out November 20, as a seminal day. Apparently, that is the day last year where the most secrets were leaked. So can you explain to us So why that day was such a was such a hot property? And what that tells us more broadly?

MJ:: Yeah. So that particular day, we’re not entirely sure why that one was the WIDA. But there are some information that that day was a Saturday. And when we look at the information, what we can see is that the most amount of secrets leaked publicly on weekends. So you know, when we ask people, when do you think most amount of sensitive information will be leaked, it’s kind of Friday, maybe you’re doing a build Friday afternoon, you’ve got your mind on after work beers, and you make mistake, push some code up, you know, or Monday, if you’re bit lagging on the weekend, but it’s actually weekend because people are working on their personal machines. They may be working on personal projects, and then that’s where the secrets end up getting pushed into public places. So it’s, yeah, it’s, it’s a different phenomenon that I think anyone would expect. And if we ignore weekends, the next most common places to that secret Elite is on public holidays. So why November 10? Was the day I thought, sure it was a sad day. I perhaps it’s kind of in that in between before Christmas breaks and trying to finish off here. But your projects out there.

BB:: Yeah, it was Saturday, November 20. So that jobs and again, you know, a lot of times you hear that the most wicked tax will happen on a public holiday when people are at least on guard, I guess. Yeah, yeah.

MJ:: Yeah. I think I was saying 10th, that you’re right. It’s the 20th. So my mistake.

BB:: I just had it written down here. And I was cheating a little bit. Top of mine? Well, again, I want to call out that there’s a lot more, you know, granular information about the state of secret sprawl it and a lot of good information. But I think maybe to wrap it up today, but since you are a developer advocate, and that’s not something we not necessarily hear from enough, I think in cybersecurity reporting, maybe is there some parting words of wisdom that you can leave with the security community, to better engage developers to bring them along to the cause? Rather than framing this as a more adversarial sort of relationship?

MJ:: Yeah, definitely. I mean, it is interesting, because I walk between two worlds, the two worlds there. And I don’t necessarily feel conflict in myself, but I certainly see it everywhere. And I think part of that is because there’s pressure on all teams coming from different areas. So developers have this huge amount of pressure to kind of release software quickly to kind of get the next build done. They’re finishing up for the next sprint. And obviously, security engineering teams and operations teams have huge amounts of pressure to make sure that your that everything’s running smoothly, that there’s no downtime, there’s no breaches. So there’s no vulnerability.

So you have these conflicting interests. And what I see really works well is by creating the conversation in an area where each person has control over it. So for example, encouraging developers to be able to take control over managing their secrets, making sure that they don’t leak secrets in their, in their source code is something that they have control over it is not an abstract concept, it’s something that they will intimately understand. And then it immediately puts them in the driver’s seat. So that will create a better relationship, and sharing and creating a kind of a shared goal in that, which is ultimately what it’s all about.

So yeah, getting an I would say broader than secrets to making sure that the developers have tools in their hands to help them help the security teams by, you know, allowing them to make good coding practices, that fits into their life cycles that they have control over and not implement so much of these kind of blocking measures where they need to talk to you where they need to kind of come through and they don’t really understand it, and it’s not explained and they kind of everyone just feels frustrated and a bit silly. So by allowing them to have some control over what they can control, immediately fixes some of that tension in relationship. So you know, I’m a huge advocate on shifting left. I don’t expect developers to kind of really dive in and know everything about security is a huge topic you couldn’t, but there are certain elements that they can do and that will help the relationship for sure.

BB:: Wonderful. I think that’s great advice. Again, we are here taking a look at the state of the secret sprawl report from GitGuardian. And McKenzie, I just wanted to give you an opportunity before we wrap up to call out any other information you think would be helpful or that you want our listeners to know more about.

MJ:: Yeah, for sure. I mean, I always think it’s helpful to kind of understand really the scale of this that we’re talking about. Everyone knows that secrets are bad. Everyone knows that it shouldn’t be in public, you know, but how often does it happen? So the key facts from the state of secrets role is that last year 2021, we found 6 million credentials that were leaked publicly. And that’s just in public GitHub. And it doesn’t even factor in the Docker images, we find that 4.6% of Docker images contain secrets. And then the scariest one is when we actually look at internal repositories, and an average company of 400, developers will find about 1,000, unique secrets, each with about 13 occurrences. So we look at this, this is huge, the scale of it is massive. And when I’m always talking to this, and just to kind of wrap this up, this, you know, this isn’t a small problem credentials are used in a variety of way by malicious actors. And to be able to kind of understand the problem, we have to understand the scope of it.

My passing words would be not to scare anyone. But definitely take a look at the report and just dive a little bit deeper understand it. And, you know, it gets to know how big of a problem with this so that we as a community can start to fix it, because it’s gonna it’s gonna have to involve everyone, as you’re involved from education to developers, security teams, Operation teams, and also, the service providers themselves have responsibility in this also. So we all have a job to play and I hope to put out a state of secret for one year that we can get below a million credentials found in a year I think that will be that will be great.

BB:: I was just gonna say it’s time for all of us to roll up our sleeves and clean up those repositories. And again, hopefully next year, our the state of the secrets ball will be much smaller. Well, we want to, again, think GitGuardian. We want to thank Mackenzie for taking the time to illuminate this topic for us today. For any Threatpost, listeners, again, who are interested in exploring, GitGuardians report, we can provide that or reach out to Mackenzie directly. I’m sure he’d be happy to direct you to the right people within his organization. For more information for Threatpost Podcast Series. My name is Becky Bracken, I’m a journalist with Threatpost. And if you have any questions or comments about our ongoing series, please reach out to me directly. But for today, we want to thank you Mackenzie and thank GitGuardian for helping us keep the series going.

MJ:: Thanks so much. It’s a pleasure to be here. All right, everybody. Have a good day. Bye bye.
Transcribed by https://otter.ai