Congratulations, you have recently migrated a large data center to the cloud and you can now take pride in calling yourself a ‘Digital Organization’. Not quite yet, Cloud is not the new data center; it’s a culture and like every culture shift this too comes with its own maturity cycle.
While moving away from the data center to cloud could have been the first step in making yourself a digital organization, it’s certainly not the last. The world of cloud is very different from its equivalent data center world, I keep iterating it ‘Cloud is not your data center run by someone else, it’s much more than that’. So, let’s take a moment to understand what a possible next step could be for maturing in the cloud.
Step – 1 – Understanding Waste
Cloud works on a pay as you go model i.e. the more resources you have running the more you pay. So, the first thing that you need to do is to know what’s needed and what’s not. Here are a few examples of wastage
- All your virtual machines are running 24*7 (Even for dev/test environment).
- All your storage is in frequently access/hot tier (including snapshots, back-ups, 10-year-old data).
- You have a policy to define when to take back-up but not one to define when to delete the same.
- Everything is running on a virtual machine, literally everything. You are not using the power of containers and other cloud-native services.
- Customer-facing services have 20 instances each running together because you need scalability to cater to surges in traffic.
- There is no segregation done between the configurations required to run production workloads and the ones required to run Dev/Test.
Step -2 – Deciding Baselines and Governance Policy
Once you have understood where you are bleeding its time to start creating a performance baseline and governance policy. Some of the things that you can do
- Create a downsizing policy for the VMs – Something like if a VM is utilized below 45% for a continuous period of 20 days, scale it down.
- Create a lifecycle management policy for your data – Something like any folder/file or data not accessed by any user over the last 30 days should be moved to infrequently accessed and not accessed by any user over a year should be moved to the archive.
- Create a deletion policy for your snapshots and back-ups – Something like anything older than 7 days should be deleted. Its incremental back-up so it doesn’t matter if you don’t maintain the point in time snapshots for a very long time. If there is a compliance requirement, make sure you put older data in archival at least.
- Explore PaaS and Auto-scaling – Stop running 20 instances for everything (this isn’t data center), start thinking of PaaS and create a policy for auto-scaling for the webserver that are running customer-facing applications.
Step – 3 – Enable continuous monitoring and continuous optimization
Just defining policy is not enough, you also need to enable monitoring. Monitoring is the fundamental piece to achieve operational excellence and it goes hand in hand with automation.
Monitor and Automate whatever you can, some examples of this can be
- Both AWS and Azure offer lifecycle management policies for data stored in the storage. Ensure you have the right lifecycle management policy in place for your data.
- Ensure you have alerts in place to tell you which VMs are not complaint to your baseline policy. Better still write a runbook that gets triggered from the alert and downsizes the VM.
- Ensure you have enough monitoring enabled to track older back-ups and delete the same.
- Write an automation runbook to turn on and turn off non-production instances for office hours only.
- Ensure all provisioning, specially dev/test is logged and monitored. Create a function that sends the project manager and email with provisioning details and estimated cost.
Last Step – Get expert advise
Adaptation to the cloud can be both overwhelming and challenging, while most IT teams today have basic skills necessary to handle cloud landscapes, I personally believe that having an expert from the ‘get-go’ certainly helps. Think of having a tool or a partner to help you enabling operational excellence in the cloud, what we typically call MSP (Managed Service Provider) and CMP (Cloud Management Platform). Partners and Platforms (outside cloud-native) are your trusted advisors in your cloud journey. Some of the key questions you want to answer
- What tools help provide me cost management, governance, security and automation under one umbrella?
- How to enable my project leads to be more cost-aware and cost-conscious?
- Are we using the right sizes and specifications, can we think of moving the production workloads to a reserved instance?
- Are we in a steady state to move to reserved instances?
- Is there an obvious miss that we are not seeing otherwise?
- Do we have the right skills and processes in place to manage this transformation?
This article is originally published at https://www.cloudmanagementinsider.com/how-to-achieve-operational-excellence-in-the-cloud/