Recently, many companies experienced a significant IT disruption caused by a CrowdStrike update. At KalioTek, we’ve been on the front lines, helping our clients rapidly recover and restore normal operations. This event served as a stark reminder of the importance of having a robust IT infrastructure and recovery plan in place. Here are some key takeaways and best practices we’ve gathered from this experience.
The Challenge of Rapid Recovery
During the CrowdStrike outage businesses faced numerous challenges, from disrupted operations to concerns from customers, partners, and suppliers. Key business leaders found themselves stretched thin, dealing with these concerns while trying to manage the IT response. This scenario highlighted the critical need for a dedicated IT partner who can quickly mobilize a response team and handle the technical aspects of recovery.
The Value of a Dedicated IT Partner
Having a rigorous IT partner proved invaluable for our customers. We were able to deploy a team swiftly, ensuring that the IT response was managed effectively, allowing business leaders to focus on core operations and stakeholder communication. For emerging companies, which are often understaffed in IT, this kind of support can make a significant difference.
Stress Testing IT Teams
The outage served as a perfect stress test for IT teams. Many emerging companies struggle with limited IT resources, often relying on the knowledge of one or two key individuals. This can be problematic when a widespread issue occurs, as these individuals cannot respond effectively across the entire organization.
Importance of Burstable Capacity
To mitigate this risk, it’s crucial to have a team with burstable capacity – one that can scale up quickly in response to emergencies. This team should be highly organized, with consistent documentation and clear procedures. Such preparedness ensures that the organization can respond swiftly and efficiently to any IT crisis.
Certification and Best Practices
For companies utilizing Managed Service Providers (MSPs), it’s essential to look for SOC2 certification. This certification serves as a validation that the MSP has implemented critical best practices and can be relied upon during times of crisis.
Essential Components of a Recovery Plan
Having a well-defined recovery plan is non-negotiable. Here are some key components that should be included:
Secure Backup Procedures
Ensure that you have secure backup procedures in place. All data can potentially be corrupted, so having reliable backups is essential.
Automation and Documentation
Automate and document the process for rapidly rebuilding computers. This includes documenting all systems, admin passwords, encryption keys, and IT inventory. The team must have ready access to this information in an organized and secure manner.
Spare Inventory
Maintain an inventory of spare laptops. During recovery, some machines will inevitably encounter problems. Having spare computers ready to replace those with issues allows for quick recovery and minimal downtime. Additionally, ensure that you can rapidly recreate computers with the correct configuration and data, and have the capability to overnight them to remote users if necessary.
Designing a Supportable IT Infrastructure
In moments like this, it’s clear that having an IT infrastructure designed for supportability is crucial. The system should be well-documented, not overly complex, and easily recoverable. Many emerging companies operate with a patchwork of systems that have developed organically over time. This approach can lead to inefficiencies and increased recovery times during a crisis.
Given these challenges, a professional review of your IT architecture is warranted. Such a review can identify potential weaknesses and provide recommendations for creating a more resilient and supportable infrastructure.
The recent CrowdStrike outage was a wake-up call for many businesses. It underscored the need for a well-prepared IT response team, a robust recovery plan, and a supportable IT infrastructure. At KalioTek, we specialize in supporting venture-funded life sciences and technology companies through every stage of growth, from startup through IPO or acquisition. Our expertise and experience can help ensure that your company is prepared for any IT crisis, minimizing downtime and maintaining business continuity.
By implementing these best practices, you can safeguard your business against future disruptions and ensure a rapid recovery when the unexpected occurs.