You’re planning on going virtual with your servers. Everything is going well. You have your hypervisor deployed and you’ve converted your servers from physical to virtual. You’ve “lived the dream” and put your dev and test environment on there as soon as you could and allowed other people to create the virtualised servers they needed. Life is good.
But wait (enter the sound of screeching tyres). Is this so good and what can it mean for the business ? After all, this is why your virtualised, right ? To be able to save costs and deploy servers quickly. Only, without the financial constraints that existed to stop additional servers being provisioned in the physical world there was nothing to hold anyone back in the virtual world. If things have got out of hand then lets face it, you trusted these people to retire their servers and they’ve let you down. If this sounds familiar then you are no doubt a victim of “Virtual Server Sprawl“.
One of the benefits of virtualisation was to be able to not just load multiple services onto one instance of an operating system (physical server) but to be able to run multiple instances of the operating system and dedicate each of those instances to specific services. Server virtualisation also promised to make those services more highly available with vMotion, LiveMotion, XenMotion, etc, Virtual Server Sprawl is not the creation of additional servers. that was always expected and planned for in any virtual server migration. Virtual Server Sprawl results from “the uncontrolled creation, administration and lifecycle management of virtual servers“. The important word here is uncontrolled. Out of hand virtual server sprawl can become a nightmare for the server team with issues arising around licensing, maintenance, backups and security as well as environment stability. All of this can translate into cost that erodes the very savings server virtualisation was meant to realise. As Thomas Bittman of Gartner put it “Virtualisation without good management is more dangerous than not using virtualisation in the first place”.
I use the three stage definition as it highlights the areas that need to be controlled to prevent Virtual Server Sprawl and, as always, these issues reflect those in the physical server world. It’s just that with physical servers IT departments have had physical constraints placed on them to prevent physical server sprawl (finance, physical number of servers, limited power and cooling in the data centre, separation of dev / test LAN from main network) and frequently these limitations have not been re-invoked in the virtual world.
Lets look at each of the three areas in turn:
Creation
Uncontrolled creation of servers arises when servers can be deployed at will with no consequences. Generally this can be handled by process and if an automated provisioning process is used, by assigning users “credits” that reflect the number of machines they can create. This can also help with lifecycle management in that users will be more willing to retire servers which are no longer needed to reclaim credits. Its not enough just to assign credits to users though. Internal IT will not be constrained by credits and, as above, will create additional servers to provision additional services. As there is no capital expenditure (CapEx) procedure to follow they are more likely to add additional servers and if the virtualised environment supports memory overcommit then “That’s OK, we have almost unlimited memory”. The truth is more likely to be that the virtualised environment will page out the RAM more, shared disk sub-systems may hit performance issues and shared network connections may be overwhelmed. At least by having some sort of process that requests, authorises, provisions and reports on the virtual server provisioning process then these issues may be minimised.
Administration
Once a server is provisioned, who maintains that server ? As the server provisioning process is accelerated is the server deployed in a secure way and what effect will it have in a production environment ? If server deployment is delegated outside of the core server team then will firewalls be turned off “because it’s easier”. Will the servers be placed in the correct OU in Active Directory to have the right policies applied. If not, will the anti-virus server deploy the engine and patterns and will appropriate exceptions be applied for the role of the server ? Indeed, will anyone have made arrangements to back-up the server or will all VM’s be backed up by default even if they have no data on them (web servers) or are only used for testing. Who will patch the servers ? These issues can be mitigated by the use of virtual machine templates that include all latest kernel updates, service packs and patches for the operating system. That add the provisioned machine automatically to an OU appropriate to its role (messaging, database, file, web etc) in order that the correct GPO’s can be applied. That assign an appropriate amount of hard disk space, CPU, RAM etc. That have the basics of the anti-virus solution correctly pre-installed so that engines and patterns can be downloaded and updated on first boot.
Lifecycle Management
Let’s assume that the server was authorised for creation and it was provisioned fine. However, it was only meant to be used for 30 days to test out a scenario. If the server is never retired it continues to consume resources on our physical host., not only memory but possibly also expensive shared storage space If the virtual server has software installed on it (extremely likely even if ti is just an operating system) then it may be running unlicensed if its license has been transferred or re-assigned to another machine (either its replacement in production or another test machine). In the physical world physical servers would be retired to be repurposed or retired totally to reclaim space in the server rack, regain network ports or return scarce power sockets to use. These items that self correct IT teams use of physical servers don’t exist in our virtualised environment. Basic change control and reporting processes can limit the effect of virtual machines being provisioned beyond their useful lifetime.
The issues around Virtual Server Sprawl are readily identifiable and easy to anticipate.
- Increased paging of RAM due to overcommitment / over usage of physical RAM in host
- Reduction in available storage space which may be expensive in shared storage environments
- Additional network traffic and possibly incorrect assignment of servers to appropriate VLANs
- Security vulnerabilities with machines incorrectly configured or patched
- Incorrect policies assigned due to mis-placement in AD
- Lack of backup, DR or business continuity
- Possibility of licensing issues with applications on redundant servers
There are many software packages you can buy to help but they are only as good as they are configured and if they are used rigorously. For those on a budget Virtual Server Sprawl is reasonably easy to control with forward planning.
- Process & Authorisation
- Configuration & Templates
- Monitoring & Reporting
Should Virtual Server Sprawl stop you from virtualising your environment ? In my opinion absolutely not, but you definitely need to be aware of its existence and plan accordingly.