In my previous post I explained how I personally got around issues with regards to joining machines to an EC2 cloud network. This post is more about explaining exactly what I was doing on this system and how I used Amazon’s CloudWatch service to monitor the performance of my Application Under Test (AUT).
I work for Red Gate as a Test Engineer and for some time now I’ve been involved with the creation of the recently released SQL Monitor 2. I started using Amazon’s EC2 service to create a network of 50-60 machines that I had full control over. In this virtual network I had specific machines for domain controller, base monitor (the machine that runs our data collection service), web server (to run our Web UI), data repository (a standard Amazon SQL Server AMI) and a machine that ran an internal application for creating activity on SQL Servers instances. All other machines were pre-configured as SQL Servers containing a hefty database which our sql activity application would be hammering on many threads (each thread = 1 user connection). The idea was that this would go some way towards simulating the real-world environments for which our monitoring solution would be used.
The 50 remaining virtual machines (all running SQL Server) were added to SQL Monitor and the software was left to do its work for several hours (long enough to monitor performance impacts but no soak testing being done here). This is where Amazon’s CloudWatch service comes in handy. We were able to simply toggle CloudWatch on or off for any of the running instances. But to prevent information overload we chose to monitor the base monitor, SQL Server data repository, web server and several of the Servers being monitored. If you’re wondering why we monitor the servers that SQL Monitor is itself monitoring (it can get confusing), it’s so that we can see our impact on these machines. Although we have an idealised zero-impact goal for our customer’s SQL Servers (unlike some of our competitors we don’t install agents) , the perfmon, WMI, windows file sharing and SQL queries do add a slight overhead. So this had to be monitored very carefully.
As I stated CloudWatch can be toggled on or off for any instance. This can be done in the start up code (if you want to automate) or using the Amazon Management Console. In a previous post I criticised this console for various reasons but its support for CloudWatch is very nice. You can view graphs for CPU Utilization, Network Inbound, Network Outbound and get an early indication of issues on any of the machines. This brings me nicely to what I consider to be a huge omission with CloudWatch. For some inexplicable reason it doesn’t monitor memory usage and I can only assume that there are technical reasons for this as it’s a huge oversight.
Once the test period was finished and the monitoring data safe on my hard drive (it can be collected using the CloudWatch API), I would analyse it in Excel and if necessary create graphs to emphasise any performance issues found.
Base Monitor machine monitoring fifty servers:
This testing ran in parallel with more traditional in-house testing and went a long way to being able to confidently say that the product performs acceptably under stress.