RapidScale provides rapid response to CrowdStrike customers

Written by Sarah Davis | Aug 22, 2024 4:00:00 AM

When our tabletop exercises became real with the CrowdStrike outage on July 19th, RapidScale heard the call. 60 of RapidScale’s cloud engineers worked through the night and into the early morning, patching up to 1,200 servers per hour. Many of our West Coast CrowdStrike customers were 100% online by the time they got to their desk.

Learn more about that night and RapidScale’s response in this episode of Ahead in the Cloud, featuring RapidScale’s COO Claudette LeBlanc and our Director of Cloud Support David Mattatal.

[video src="https://player.vimeo.com/video/999567870" aspect_ratio="1/1"]

Claudette: Hi, and thanks for joining us. I'm Claudette LeBlanc, RapidScale's Chief Operations Officer and I'm joined today by David Mattatal, our Director of Cloud Support. We are sitting here reminiscing about a day not too terribly long ago when all of us awoke to the blue screen of death.

On Friday, July 19th, CrowdStrike's routine Falcon sensor update didn't go as it was planned. RapidScale is a customer and a provider of CrowdStrike, so we had a little bit of work to do in front of us.

Dave, give us a little about yourself and how these first few moments of the CrowdStrike catastrophe were for your team.

Dave: Absolutely. I’m Dave Mattatal, Director of Support here at RapidScale. I started here eight years ago as an engineer. Had the adventure and fun of working my way into leadership. It's been a blast. And yeah, the 19th was an adventure, to say the least. We started getting alarms around midnight on the 19th, and our systems picked it up, which caused us to immediately engage into a reaction state. We raised a bridge. We started bringing in external resources and on-call resources.

In addition, even folks that weren't on call were able to jump in and start helping us out, which was really cool to see. We also engaged our partners at AWS, Azure, and Google to see if we could get some additional technical guidance on what was going on. So it was nice because we had not only internal resources but we also had external resources, partners, and vendors helping us out and giving us the opportunity to try to find a fix and solve the problem.

Claudette: That's fantastic. I know that when I got in, our architecture team who’s responsible for the CrowdStrike instance that we use had already figured out how they were going to boot into Linux and avoid the blue screen entirely. So by the time the East Coast rolled on, there was no impact whatsoever in any of our data centers, which was pretty cool. You had more work to do though.

Dave: Yeah, our public cloud and our hosted cloud teams really leaned in here to try to fix these problems. And the best part was we were able to collaborate in an event that we've never really collaborated in.

We've done a lot of tabletop exercises. We've put a lot of process in place to try to figure out what to do in an event that impacted both hosted and public cloud solutions, and we were able to put it into practice. And the fun part about this is how smoothly it went. We were able to restore 60 servers every five minutes, which was great.

So, the math behind this was it took about five minutes to apply the fix. Once the fix was identified, we had about 60 resources working on the problem here, so we could do 60 servers every five minutes, which was super fun. And it was really fun with the collaboration behind it, too. We saw two teams that typically work in two different verticals – as we know, hosted and public is very different – come together to solve a problem and create a unified solution.

Claudette: Yeah, that is fantastic. So, my understanding of our public solution was that we didn't have the ability to get in front of it the same way, and we went directly to all of our CrowdStrike customers that we thought could have been impacted, right? So we were doing really white glove outreach there. How did that go?

Dave: It went really well. So, with the monitoring system we have in place, we were actually able to query our customers and say who had CrowdStrike installed and who didn't and we could proactively reach out and start solving this problem, in addition to contacting people and letting them know there was a problem. The good news, is some of our customers were online even before they woke up on the West Coast. In addition, all of our customers were solved by the afternoon on Friday. So we had very happy customers.

Claudette: Yeah, that was my favorite part about it from where I sit in the business, because our CEO was stranded on the West Coast when all this was going down. So, by the time he rolled on to say, what is happening? I was able to give him a great report about your team's progress. So that was a huge win. So our all-clear went out while we were still hearing flight cancellations and watching the market. So how did the team feel at the end of something like this – where it was truly an all hands on deck, even if you're not working today you've joined in, and people are coming off of their vacations to help? What was the air on your team by the time this resolution was complete?

Dave: Happy and tired. I think is probably the best ways to put it. So waking up at midnight isn't always easy, even though we're staffed and now we brought more people in. But everybody was super excited to have that collaborative effort and see everything work really well. And our customers were happy too. So it wasn't just us. We had a lot of happy people that walked out of a very stressful event, knowing that everything we put into practice went into play extremely well.

Claudette: For sure. And we're certainly very proud of them and I love seeing all those high CSATS coming in, showing that our CrowdStrike customers really did appreciate the work that folks were doing in the moment. So very good there. A thing that I thought was really interesting in the aftermath of all of this is, I was reading the CrowdStrike releases and the CEO noted, that something that we should really be vigilant about is that the aftermath of events like this, we usually can expect – not even might see, but expect – an uptick in bad actors and more kind of criminal activity.

So what's your advice for customers, for even those of us who maybe are not as technical as our engineers, how do we guard against things like this?

Dave: Absolutely, and I think it was a really important statement to call out. We have to remain diligent on our security practices because people are going to monopolize this and people may be afraid of the security stance that they have today because it took down their business.

But the reality is what they experienced here would be significantly worse if a bad actor got in and impacted their business, there was a ransomware event, or even potential data to compromise. And so we always encourage you to stay strong in your security stance. And if CrowdStrike isn't the right answer for you anymore, that's okay.

We offer our own security services, too. You're always welcome to talk to us about it. It's called Proficio. It ties into a lot of other EDRs. But yeah, always stay vigilant on your security. Never step away from it because that posture will keep your business safe and successful and online, most of the time.

Claudette: Most of the time. There we go. All right, so y'all, we've given you maybe five minutes or so about how RapidScale responded, but just remember that in that time we could have turned 60 servers back up. So, thanks for joining us today to learn more about our response and the care that we put in front of every customer incident and just know that you have a partner in us.

RapidScale’s security services through Proficio provide peace of mind in the midst of potential chaos. Ready to get moving in the right direction? Send our team a message today.

Learn more about RapidScale’s response during CrowdStrike:

View full post