Chandler Santos

March 4, 2021

What the Texas power grid failure can teach us about incident response

Incidents happen. Some small, some major, others knock out nearly all of Texas’s power, water, and internet. As I sit inside my office aweing at the snowfall outside, the Breaking News in the background found great pleasure in interrupting my peace. Over the last few days, I’ve listened to many different news channels add their unique personality to the elephant in the room: How poorly the winter weather was handled. This made me reflect on the NIST Incident Response Process. Given reduced internet speeds of 0.25Mb, Microsoft Word is the only functional application for me at the moment, so I thought I’d use the opportunity to extend my reflection.

1. Preparation
image.png


Incidents, similarly to rainy days, will happen whether you are prepared for them or not, so make sure you are prepared before they happen. Every time I get a check, and after I’ve given to my church, I’ll slide 10% in my saving account’s pocket. “Save money every time,” is a lesson my father has taught me because “you’ll need that money when a rainy day comes.” Despite the fact that my father is not a philosopher, I am confident that his quote and this analogy fit accurately.

“A national electric-industry group developed winterization guidelines for operators to follow, but they are strictly voluntary and also require expensive investments in equipment and other necessary measures”, says an article from the Daily Journal. Obviously, a preparation plan was available, so a few conclusions can come from this: they left their prep plan on the shelf collecting dust; someone did not escalate a potential incident thinking it might be a false positive; or simply, there wasn’t enough training for someone to understand the risk/situation.

2. Detection and Analysis

To prevent and incident from causing damage, it is important to detect the unusual activity and investigate further. It is very likely that energy companies would have detected a significant surge in energy consumption during this phase because people were trying to stay warm.mDuring this phase, data from different sources is taken into account to determine incidents based on indicators — such as SIEM, IDPS, network device logs, and people in the organization. After an incident is detected, you need to assess its severity, identify the attack vector, understand the impact of the attack, and document the findings.

3. Containment, Eradication, & Recovery

Essentially, containment aims to stop the bleeding. We saw great containment of the situation from energy and water system companies alike. As the power grid was failing, the many had to begin digging in their cabinets for lighters and candles. The rolling blackouts were a major inconvenience to many, but they were necessary. “When a power supply shortage is detected in a market [t]hey help to prevent widespread blackouts across a region,” says an article form Oklahoma News 4. In incident response, containment is where you patch the threat’s entry point.

Eradication
is about eliminating the threat. In our case, the threat is the winter weather. It is almost impossible to stop the weather (maybe Elon Musk will say otherwise in a few years), so I cannot explain much more. In incident response, if the threat infected one system, it can affect other systems. It is more than just closing the door it came in from, it is completely removing the threat. Just ask anyone who had to deal with something polymorphic. It will keep replicating itself. Almost anyone who ever faced Emotet or TrickBot can attest to this.

I think that this is a great point to mention communication. In my local area, CPS Energy and San Antonio Water System (SAWS) did an excellent job keeping the media and public informed. With this important information and the help of media outlets, announcements were made such as: Avoid the use of large electric appliances; keep the doors of restrooms and closets open; the start and end times of the ‘boil water’ notice; what zip codes where going to be affected by blackouts. In incident response, we should follow in these footsteps and be transparent and considerate of our customers. It is never okay to lie or cover up an incident.

A good example of bad communication is the StockX data breach. I am a big sneaker head. During my college days I remember dragging a friend out of his dorm and sleep walking together at 6 AM to the library to turn on over 30+ computers to the Yeezy release page in hopes of snagging a pair of the limited release sneakers. I was unsuccessful that day, but the days I was successful StockX was my go-to marketplace to sell them. I experienced the data breach first hand, found my credentials online, and multiple different accounts brute-forced for weeks, so it is my go-to when explaining the proper way of communication. To sum up StockX’s communication, they emailed customers to reset their passwords because of a “System update”. Several days later, a forum began offering their full database for sale. TechCrunch wrote about the event, then StockX eventually responded to the media with a statement. Below is a snippet from TechCrunch’s original article.

image.png


If you are looking for a great example I would suggest FireEye. They were responsive, explained why the attack took place, who they believe the attackers were, and what they believe the attackers stole. https://www.fireeye.com/blog/threat-research/2020/12/unauthorized-access-of-fireeye-red-team-tools.html

The recovery phase aims to return the system to full functionality and resume normal operations. A few scrolls on Twitter and you will notice more and more people are getting their power back.

4. Post-Incident Activity (Conclusion)

I once knew a guy who frequently repeated to himself, “Experience is simply the name we give our mistakes." That was about five years ago. I don’t believe he originated it the quote, but that's who I remember I remember it from. It’s kind of humorous when you think about it but nevertheless very true. Ironically the man who said the quote, I knew him as, Mr. Wright.

There should be a “lessons learned” meeting during this phase to discuss the full experience of the indecent. Including what went well, what needs to be done in future incidents, and what needs to be avoided. The energy companies are going to have to reevaluate their actions and definitely improve their plan. To conclude, NIST only provides the process, and it is up to security teams and organizations to use them appropriately.

About Chandler Santos

I will send periodical life updates and other writings here. Add your email below.