January 16, 2021

How to leverage cloud architecture for high availability

How to leverage cloud architecture for high availability

Cloud architectures are widely used today; clouds are a series of great alternatives in terms of services and solutions. However, with tremendous power, with great responsibility as well as its resources, the cloud represents itself as a place where failure can occur and unexpected events end. So, it will spread across the entire architecture quickly, which can cause large outages that can leave a business on its knees.

Okay, that’s not a very optimistic outlook – most likely the opposite – but not scared. This is the essence of almost every architecture – and why the cloud should be different?

Cloud architects face two different problems of scale at any given time to prepare for the worst possible at any time in order to remedy the problem.The worst possible result. First of all, if something unexpected and unexpected happens, how to resume the business as if nothing had happened, and second, if something unexpected and unwelcome happened. I have to find out how to troubleshoot and restore the original data. If something goes wrong and I can not continue to work as usual, how can I bring the architecture to a window? Other reasonable time, and then, continue to operate as usual without too much of a big impact on the overall work?

In these terms, we can discuss some of the issues:

– Continue to work normally in the face of unexpected incidents

– Continue to work as usual as well as troubleshoot in the shortest possible time in the face of unexpected problems.

The first is covered by high availability and sensitivity, and the second is covered by constant attention to disaster recovery.Here, we will consider high availability.


Current alternatives on the market


The clouds bring more than what is expected to face the possibilities that might be encountered.Most of the clouds are distributed geographically and technically to avoid major blackouts. On a small scale, the clouds have what is called an Availability Area (AZ) or Domain Availability (AD). These are usually different buildings or clusters of buildings in the same location with a geographically managed and interconnected area. This looks good but very redundant, especially It is in what is called electricity, refrigeration and storage.

On a large scale, rattan is divided into regions, large areas, global areas, that is, with 10 or 15 regions if we look at giants like Google Cloud and Amazon Web Services. . These areas are spread around the globe and serve with different purposes such as isolation in case of disaster and performance. Customers in different countries and continents will be served at the nearest service point with the fastest support, without having to divert to the main point. That’s what makes waiting times faster and more responsive.

Putting all this into consideration, it is a task for architects to design services with areas and areas available in mind with the task of always servicing the customer properly with taking advantage. The advantages of technology and knowledge are at hand.The architecture is not replicated by the cloud service providers in different areas – that is, the architects and technical staff to review and resolve, as well as the available domains, unless discussed,on storage; calculates cases and virtual networks for reference to core services, not copied throughout AD or AZ for the most common parts of the workflow.

High availability alternatives include avoiding error points, checking the resilience of the architecture prior to deployment and building master / slave / active / passive master / slave solutions to be available or automated can reduce the time of validity to a minimum.


What are the best practices considered?


The following is a list of best practices and is believed to be most effective in the process of delivering HA in the cloud. It’s not entirely comprehensive, but it can also be applied to meet the needs, and support to a higher level, to the data center architecture.

• Load balancing on ads, be careful about a failure point (SPOF) in the architecture, two are one and one is zero.

• If the cloud provider does not provide redundancy on the Ads and at least three copies of the same data, it may be a good idea to reevaluate the supplier’s decision, or consider a service to do so

• Easy to get, easy to get out: it is necessary to have the certainty that in case it becomes primitive to move or redirect the service, it can do so with dark effort minimal

• Implement additional monitoring and data systems where possible, not to mention good integration: if possible, beyond the shelves, through third parties, can provide rich diagnostic and timely information time Platforms like Relic, or breakdown tools like PagerDuty, can be extremely valuable

• Keeping the architecture versioned, and in IaaC (infrastructure as code) form: if an entire region goes away, it will be possible to spawn the entire service in a different region, or even a different cloud, provided data has been replicated and DNS services are elastic

• Keep the DNS services elastic: this does not need to be said, especially after the previous step; Flexibility is the key in assigning records in one direction or another

• Some clouds do not charge for crashes, especially for virtual machines; for example, Oracle only charges for cases that are stopped if they are Dense or high, otherwise it will not. It is easy to take advantage of this and keep a duplicate architecture in two areas; With IaaC, this is not unreal and it is also easy to maintain

• Synchronize essential and critical data on persistent ads in a manner that is readily available and uninterrupted, avoiding the use of NVMe if it implies being charged for computing resources. The unused nodes that NVMe is connected to

• Utilize object storage for cloning data in the +2 region

• Utilize cold storage (storage, such as glacier) to retain important data in some sparse regions; Sometimes the price to pay to break the minimum retention policy and recovery requirements are valuable to bring the production environment

• Using APIs and SDKs to automate, by creating HA and failover tools, automation can transform the system into self-healing automation systems, mixed with playback. Exceptionally able to change the game. Do not rely too much on the dashboard – almost everything is possible, and some must, be done behind the curtain.

• No one needs to stick with a cloud as well as with the power of cloud data providers, simply having the infrastructure on multiple clouds at once, doing the comparison and switching. Suppliers who feel needed in the workplace

• Use tools to test the resilience of infrastructure and group availability – forging important failures in architecture that can bring great learning.


Although best practice is to practice abilities in the workplace if applied, not all of them can be applied in the same architecture or at the same time that they occur at times without Hence, an architectural assessment and experienced engineer team is always needed for a business if one wants to grow most sustainably.

That said, most points can be applied carefully without significant effort. It only takes some hard work and initial layout, but the result will make it worth its weight in the future.

Congratulations architect and keep up the good work.