CNCF Reduce Alert Fatigue - Focus on What Matters

Reduce Security Alert Fatigue. Focus On What Matters

Paladin Cloud’s Head of Developer Relations, John Richards speaks on security alert fatigue in this webinar for the Cloud Native Computing Foundation.

Transcript:

Hello and welcome to the “Reduce Security Alert Fatigue – Focus on What Matters” webinar.

I’m John Richards, Head of Developer Relations at Paladin Cloud. I spent over a decade as a developer and for the last five years, I’ve been championing the value of open source. I also host Cyber Sentries, a podcast focused on the intersection of artificial intelligence and cloud security outside of technology. I love exploring new places and trying new board games, but that’s enough about me.

Let’s jump into the reason we’re all here. Finding a way to reduce the fatigue from the incessant amount of alerts that we’re receiving and to do that, we’re gonna look at how you can focus on what matters instead

To start with we’ll begin with some history of how we got here and the massive proliferation of tools. We’ll see how this has led to a huge amount of overload as a ton of information flows into us. Next, we’ll look at how DevOps and security teams are siloed and how that barrier creates additional problems and all of this leads up to this need for a centralized source of truth.

Well, what do we do? Well, we’ll see how open source can come to our rescue.

We’ll look at how open source has value in this space with extensibility and customization and how embracing security can actually empower dev ops teams instead of alienating them or overwhelming them with information. And then we’ll look at a demonstration of open source in action and how it could solve these problems.

All right, let’s start here with the proliferation of tools. Now, I’m sure you don’t need convinced because most likely you’re already living this.

I regularly talk to teams that are working with upwards of 50 different tools as they go about their weekly tasks and even within the cloud native space, the landscape has just grown astronomically. Have you seen the new CNCF landscape page? I mean, kudos to the team who put this together?

It’s great, a lot of great information about a lot of great projects, but look at this, it also highlights the massive depth of tooling going on within the space and comes as no surprise that security is no exception to this.

And you may be wondering why are there so many different tools? Why do we need so many? Why can’t we just latch on to one single tool to solve everything? But there’s some real great value to having multiple tools and these are the reasons that teams, want to have multiple tools.

Organizations often are looking for best of breed. They want the best tool in that space. Organizations worry about vendor lock in. If you get one single massive tool, that kind of does everything often you may end up overpaying. And then if you ever want to change something you’re kind of locked in, it’s a lot of work to adjust it.

And then there’s a lot of value in embracing open source projects because of their collaborative nature. They tend to play well within the ecosystem. And so those reasons are reasons that teams say, hey, we’re gonna use multiple tools, but there does come a cost. There is a core challenge here.

How do you handle all the different findings that start coming in from across so many different tools? And that brings us to this idea of information overload. As so many different tools begin to send this information. As information begins to pour in, teams are left with two options on how to deal with it.

The first is terrible and that’s to ignore it. But we see that happen from time to time as organizations get so much information come in, they can’t act on it reasonably well or within time and they begin to ignore those alerts in which case they do no good. The tool is unable to inform those who need it and something terrible ends up happening.

The other option is to dig in and try to understand each of the notifications coming in. But that takes up a lot of time.

Now, often the reality is that teams exist somewhere on a spectrum in between those two options as they try to focus on the alerts that matter most and ignore the ones that they think don’t matter and hope that things don’t slip through the gaps. But hoping is not good enough. Things continue to slip through the cracks.

The numbers show that companies continue to be breached and often through small misconfigurations that could have been easily remedied. And 83% of security professionals say they’re experiencing alert fatigue.

And so despite trying to handle all this information, teams are overwhelmed and they don’t know how to handle it as information comes in from all their different cloud providers. They’ve got their cloud security posture management reporting, application code scanners, vulnerability scanners, and so many other tools funneling all this notification in and not enough resources to be able to properly handle all of those notifications.

It’s not only developer teams that are struggling with this. At the end of the day, it’s often developers and DevOps folks that are the ones responsible for remediating these issues. And the process of getting the information, the findings, the notifications from security tools over to the DEV ops team can be archaic.

This creates a challenge as these two teams are siloed and there’s this difficulty in getting information to flow and often that can turn into a combative relationship. The siloing of these teams creates a lot of challenges.

One of the first most common ones I have encountered is separate tooling, whether it’s for political reasons or often just license seating requirements, often the developers or dev ops teams that need to do the remediation don’t have access to the tools that are doing the security reporting.

This then means that the only way to communicate those findings is through reports that often lack a lot of context.

I received these reports before, as a developer, and going through and trying to dig through them and find out what it meant was a challenge because I couldn’t find the extra information that might have been included in the tool.

On top of this, there’s delayed response times as you might think you fixed something, but then you’ve got to wait and see until the next run or the next time this information is reported, the next report that you can get to find out if your change actually fixed something.

There’s false positives that come through these. I need to re-figure out why this came through or maybe there’s a port left open.But then you find out, oh no, this port is behind a firewall or doesn’t have any access to the public. It’s on its own VPN. And so then you have to come back and explain why that’s not actually a positive.

And then even on things that do need to be fixed. There can be a challenge where the folks who build it aren’t even sure how to remediate it. So how do they have the knowledge needed to be able to remediate the problems that they’re being informed of?

So how do you go about dealing with these challenges? Well, the best way to do that is through a strong centralized source of truth that both security teams and DevOps teams can have access to. So they could see that information and act on it and know what impacts them. Now, there’s four key things that this central source of truth needs to be able to do.

The first is to identify, you need to be able to visualize and manage the applications, cloud security risks, all of that in a single platform, then you want to look at correlation findings from all these different tools need to be able to be cross-referenced, correlated across your different cloud environments.

Then teams can prioritize. Be able to focus on the most critical vulnerabilities or misconfigurations and then remediate this is the step, everything is geared towards. Tou wants to be able to resolve those issues. So you’re looking for ways to automate things you want information and guidance on what are the steps needed to be able to remediate this so that the people working on the issue, know what they need to do to remediate it. And then that information can be verified as the problem is resolved.

This is all well and good you may be thinking but how do you make this happen? It would take an entire team to be able to build up a central data storage, build connections out to every system code up connections between each system to correlate this data. Then you need policy on top of that, to be able to prioritize things and then you need to connect this to workflow tools to be able to handle remediation.

And some folks are trying to do this rolling their own or using cobbled together systems. I talked to a team that’s trying to manage this using 30 plus spreadsheets that they send around out regularly to all of their teams to try and coordinate all this information going on. But there’s a better way, let’s look at what open source has to offer in this space.

Let’s head back to the CNCF landscape and this time, take a look under security and compliance. Here, We’re gonna look at the open source project, Paladin Cloud. Paladin Cloud is a free open source extensible cloud security platform focused on solving the very challenges that we’ve been discussing. The Cloud Center of Excellence at t-mobile set out to solve these problems years ago. The original project was known as PACbot or Policy As Code BOT and was built as an extensible project on top of an asset data store where policies and plugins could be programmatically added.

Now, thankfully for us, they open source this project back in 2018. Unfortunately, t-mobile eventually could no longer support the project and it was virtually abandoned for a while until last year when the original designer of the project relaunched it as the open source project, Paladin Cloud. Paladin Cloud’s extensible nature means that in addition to the core findings that it can pull in from your different cloud providers, you can create custom plugins to pull in data from your different third-party tools. As new plugins are built and shared with the community, more and more tools become available to everyone.

Let’s look at how Paladin Cloud contributes to that process of identify, contextualize, prioritize and remediate.

The first way is increasing visibility with its monitoring of your assets and pulling in findings from third-party tools. Developers are able to identify top vulnerabilities and misconfigurations across your entire attack surface.

We’ve already talked a bit about the value of Paladin Cloud’s extensibility. But in addition to using existing plugins, you can build your own custom ones and your own policies. And by pulling that information together, you get additional context that you can correlate.

For instance, knowing that there’s both a vulnerability in the SSH setup and that that port is open allows you to know that this is a higher risk than it normally would be. These risks are then prioritized across the different tooling that you’re pulling in from.

There’s four different severity levels to help you understand the risk associated with them. And then asset groupings, which will talk a little bit more in the demo, are a powerful way to scope the risks to specific teams or projects allowing and empowering them to manage their own risks.

The fourth piece is around accelerating remediation by connecting into automated workflows, you can speed up remediation. There’s also an extensive policy library that has links out to the documentation needed to remediate the different risks that are highlighted.

Instead of security being a hurdle for developers to try and get through, to actually deliver value, security can empower DevOps teams.

Perhaps the biggest value of Paladin Cloud is you can give everyone access. There’s no need for a seat license to reuse one of my favorite terms from open source. It allows you to democratize security, security findings that can be grouped by teams, projects, accounts, subscriptions, resource type.

Now, here’s the thing: developers want to create secure projects, but the opaque systems in place make it hard for them to know if they’re succeeding.

Remember what gets measured is what changes.

Time and time again, we see that when this level of visibility and context is put into the hands of developers, the outcome is radically more secure environments.

I don’t want you to just take my word for it. Let’s jump into Paladin Cloud and take a look at how it works and how teams can use it to solve these challenges.

Welcome to Paladin Cloud.

Now, here, what we’re looking at is the plugin page under admin and I’m starting here because the underpinning behind Paladin Cloud is all the information that’s assets and findings that are pulled in from these third-party tools.

We’ve got coverage for the three major cloud providers, but we also have plugins into Qualys, Aqua, Red Hat Advanced Cluster Security, and Tenable. In addition, there are other plugins currently under development.

And if you’re interested in creating your own plugin to contribute back to the project, we’d love to support you on that.

Using these plugins all of this data is pulled into Paladin Cloud and centralized. That data is then used to map all of your assets giving you visibility into your cyber asset attack surface.

The findings from those security tools can then be correlated to these assets to give you the needed context when you’re making important decisions on what to work on.

Now, it’s important to understand that everything we see here and in the dashboard we’re going to look at is scoped by the current asset group.

Now, we’re looking at everything here, but you can slice and dice these assets in many different ways. The system gives you asset groups for individual cloud providers. But you can also split your assets and say, hey, I just want to look at all the assets for a specific application or program that you’re using. Maybe you want to track everything that’s just storage, you could group it that way.

There’s a lot of different ways you can group this by accounts by subscriptions. Now for the demo here, we’re going to go ahead and use that all clouds view. It gives us that kind of “largest look” across everything.

But being able to give that individualized perspective to teams allows them to focus on their specific area and what they can work on and really empowers the development teams as they’re trying to solve the problems they’re facing.

All right, let’s take a look at the dashboard where teams are generally going to spend most of their time.

Here, we can see at the top we have our global search, notifications, user settings, but then we get into the dashboard itself.

Here, violations or those findings from tools are grouped into four categories: critical, high, medium, and low. And you can also see the amount of policy types that are in violation and the total amount of assets that are impacted by that criticality.

As I mentioned, the information listed here is scoped by the asset group. So we could change this to get different numbers. We’ll continue to use all clouds as we look through this list.

Now these are hyperlinked so you can click on them to dive into the violation details. In this case, automatically filtered to only the critical violations. We can run through this list and begin to work on them.

By clicking on a violation. We can see the details of that specific violation. Here we see the status, the severity, we see what category it is and what asset type it impacts. Down here is other details including a link to that specific policy. From there you can find information and links off to documentation on how to remediate this specific violation.

There’s a link off to the asset that this violation is attached to. You can even request an exemption. I’m an admin right now, so I can add an exemption, but a normal user can request one that can be reviewed and approved or disregarded. And down here, we could see an audit log of this specific violation.

All right, let’s jump back to the dashboard. So underneath the different violation severitys, we have category compliance, this tracks compliance to the four categories.

Security: Most of the policies are going to be around security.
But there’s also policies around cost: these are underused or unused assets.
Operations: around preferred regions, things like that.
And then Tagging: making sure that your assets are tagged because that’s used to drive a lot of the different asset groupings. And it’s, you just need it if you’re going to be managing a large cloud environment.

And then we have violations by severity. So this just tracks and shows you how you’re doing. So, you know, kind of the current status and there’s trend graphs. For both, we can look at those to kind of see how we’re doing over time. We want to see our compliance improving while our overall violations begin to dip down as our teams work to remediate those.

Underneath of that is the asset graph, this tracks your total assets over time. This can be really useful for forecasting or detecting anomalies. In addition, there’s a link to be able to view the assets.

We looked at this page earlier, but this is a heat map of all your assets sorted from least to most.

And you’re able to tell what your assets look like. We’ve had organizations come in here and find out that, they had a bunch of assets they didn’t realize in their cloud environment. They had a whole bunch of EBS snapshots that they hadn’t been cleaning out, it was costing them hundreds of thousands of dollars a year by storing these snapshots they didn’t realize.

Back to the dashboard. That brings us to the policy compliance overview. The policy compliance overview is a great way to focus on what really matters.

Here. we could sort our violations by severity and quantity, allowing us to see the most critical violations and the most widespread. Organizations can really improve their overall compliance by focusing on those at the top of the list. Because, if they can find a way to remediate this at scale, they’re able to solve a whole bunch of issues at once and reduce the most amount of risk with the least amount of effort.

In addition to the default policies that Paladin Cloud provides for the different cloud environments, the third-party tooling policies are pulled in as well. As we look at this list here, we can see Qualys S-five vulnerabilities pulled in.

As we look down here, we can see how Paladin Cloud provides additional value to the security tools that you’re already using. This enabled Qualys EC2 vulnerability scan, looks across all of our different compute instances or all of our EC2 instances here, and lets us know how many of them have Qualys installed and how many don’t. Organizations tend to think, hey, I’m running this across everything, but in reality, there’s often many missed assets.

By coming in here an organization can find exactly which ones are missed, know which ones they need to go look at and go remediate.

Let’s go ahead and dig into a specific asset to see what it looks like.

Nex, let’s look at policies. Here, we can see an overall list of our different policies. We can see which ones have auto-fix capability and which ones are turned on. We can also go into individual policies to get more information, which will take us to documentation on how to remediate each of the different policies.

Then there’s the tagging section, this tracks tagging across your entire environment.

You can see your overall compliance. How many things are untagged so that you can work on those to get your tagging compliance up to par.
Fix Central houses notifications that are pulled in from third-party systems and from the system itself. Here, we can see some Red Hat notifications coming in. Recommendations are pulled in from AWS Trusted Advisor.

And that covers most of the capability. Now, there’s a lot of stuff that can be done in admin like managing your different asset groups. In addition to the asset groups, you can manage users, you can provide exemptions for policies that may not be applicable for your organization.

And of course, the plugins are available where you can connect to different accounts and third-party systems. And remember, you can also create your own custom policies that you can run against your own assets and you can also add in your own custom plugins in addition to all the plugins that we had covered earlier to bring in that core information that will drive all the findings and assets inside of Paladin Cloud.

Hopefully, now you have a better idea of how Paladin Cloud helps with alert fatigue by allowing teams to focus on what really matters.

To recap, We started by talking about the massive proliferation of tools and how all the notifications from them has contributed to information overload. Combine that with the fact that security teams and DevOps teams are so often siloed, it creates real challenges. And this has created the need for a centralized source of truth. But thankfully, that’s where an open source tool like Paladin Cloud can step into the rescue. By combining its findings with an extensible, customizable, architecture it allows you to connect in all of your different tools.

By providing this information to everyone, you can empower the team’s building applications and building the infrastructure to be able to focus on security themselves and to remediate those problems.

If this seems useful for you and your team, please check out our GitHub repository. We’d love to hear any feedback you have and of course, while you’re there, give us a star.

Thanks so much for watching.

CNCF Reduce Alert Fatigue – Focus on What Matters

Reduce Security Alert Fatigue. Focus On What Matters

Transcript: