| | 9 APRIL 2024CIOReviewcost. It's our responsibility to define the best balance: the value of technology to achieve that level of resilience versus the value of falls for the same. The key is to distribute applications and services across multiple platforms, reduce dependence, and "not put all your eggs in one basket."Resilience Strategies and ArchitecturesBackups are the minimum strategy any technology service must consider; they are essential insurance against technical failures, data corruption, and even cyberattacks.Next is high availability. There are various ways to configure it, but fundamentally, it involves having redundancy for critical technological components. Depending on the desired recovery level, this redundancy could be within the same data center, in the same cloud zone, in another geographically distributed data center, in another cloud zone, in another cloud region, or in other clouds.Depending on our architecture (cloud, on-premise, or hybrid), we may even need all of these simultaneously. It is essential to be open to all options, considering that it's not necessary to back up an application in the same technology. The key is to distribute applications and services across multiple platforms, reducing dependence and "not putting all your eggs in one basket."In Cloud-native architectures (microservices and containers), we can leverage their qualities, allowing them to function and scale independently. This means that a failure generates a partial impact instead of a total service outage, as often happened with monoliths. This facilitates identification, impact mitigation, and even automatic failure recovery.Additionally, the resilience strategy should include a Disaster Recovery Plan (DRP) indicating how to act in each failure or threat scenario, including partial service disruptions (e.g., in a single component), complete service disruptions, or disruptions in a complete location, such as a cloud or data center. Regular tests must ensure their operation in each scenario, providing peace of mind to the organization to activate them when necessary. These tests allow us to identify improvements in the process, following a judicious exercise of lessons learned, which also applies to postmortems of critical incidents.What Are We Doing at Davivienda, and What Comes Next?At Davivienda, we apply all the mentioned strategies. Due to the diversity of our technology (in platforms and architectures), we have employed various resilience models for on-premise, cloud, and hybrid environments. The result has been an increase in availability, maintaining our critical systems operational 99.9% of the time. We focus our efforts on those directly affecting our customers, continually evaluating results, identifying root causes of incidents, learning, and applying improvements in processes, people, and technology to avoid repeating similar events.We work to reduce the risk of the changes we introduce into our technology. We test them first, and before moving them to our final platforms, we evaluate that they meet the conditions guaranteeing total success or minimal impacts, considering immediate solution mechanisms, such as rolling back changes.Additionally, this year, we initiated chaos engineering tests, intentionally introducing controlled failures to simulate adverse situations.Another practice we have been working on, now a compendium with new approaches, is SRE, or Site Reliability Engineering, which focuses on keeping systems online reliably and efficiently. An example of how SRE is applied is by implementing gradual changes in an application and closely monitoring how they affect performance and availability. Furthermore, SRE involves intensive automation to manage operations and reduce the risk of human errors.The mentioned practices and architectures play crucial roles in building robust, reliable, and adaptive technological systems to meet the growing demands of the digital environment--exactly what we aim for at Davivienda. In the Casita Roja, we are confident that by implementing resilience strategies, we will move forward with confidence, knowing that our technology is prepared to face challenges and provide reliable services to our customers. In technology, everything is possible. We can achieve the highest levels of protection and resilience, but the chosen architecture will be decisive for the cost
<
Page 8 |
Page 10 >