Using PowerShell to extract and report on AWS CloudWatch Metrics

Here at Planet Domain we have a lot of day to day challenges in performance and cost management.

We built a fantastic CD pipeline that allows our developers to rapidly spin up new infrastructure, enabling them to quickly build, test and deploy new features with minimal assistance from the Ops team.

Unfortunately, it also allows our developers to rapidly spin up new infrastructure than then lays idle for the greater part of the day, sucking money out of budget that could otherwise be used on beer, pizza and Xbox accessories.

Now, Cloudwatch is great, but it does tend to involve a lot of clicky-clicky mouse-movey, clicky-clicky stuff in the AWS console. And once you’ve got a lot of points to monitor, the graphs become unreadable. And the filtering options are sometimes finnicky, and it’s not easy to automate things like CloudWatch Dashboards using CloudFormation.

So to get a report on average CPU utilization across our AWS Autoscaling Groups, I turned to PowerShell.

I don’t know if you’ve ever used Cloudwatch metrics from PowerShell before, but let me just say this:

The documentation is a little… sparse. This is what you get if you ask the built-in help for examples

C:\> Get-Help Get-CWMetrics -examples

NAME
    Get-CWMetrics

SYNOPSIS
    Invokes the ListMetrics operation against Amazon CloudWatch.

Yep. Good examples, huh?

That’s your lot, buddy. The rest of the CW cmdlets are no better. So at this point, you’re on your own. So recently I reserved a little time and had an explore through the Cloudwatch Space, and here’s what I came up with.

1. Metrics

In CloudWatch, a metric is, essentially a measure which you can examine to find out more about what a particular resource is doing over time. Metrics have a Namespace, which is the service area they’re tied to – such as AWS/EC2 or AWS/RDS.

Each Metric has a MetricName and a Dimension. Your MetricName is generally what you’re measuring, such as ‘CPUUtilization’ or ‘NetworkIn’, and your Dimension varies from service to service, but may be an InstanceId or AutoscalingGroupName or similar service-specific identifier. This is a Key/Value pair that describe, naturally, the Name and Value of the Dimension, for instance

@{Name="AutoscalingGroupName"; Value = "prod-main-websiteSOMEIDHERE" }

You grab these with Get-CWMetrics – filtering as required on the dimension, metricname or namespace.

SYNTAX
    Get-CWMetrics [[-Namespace] <System.String>] [[-MetricName] <System.String>] 
    [[-Dimension] <Amazon.CloudWatch.Model.DimensionFilter[]>] 
    [-NextToken <System.String>] [-NoAutoIteration
    <System.Management.Automation.SwitchParameter>] [<CommonParameters>]

So far so good.

As so often with AWS, the Cmdlets break the convention of using the singular noun, so the key Cmdlet here is Get-CWMetrics. Note the ‘S’

You can use this Cmdlet to discover what Metrics are available to you, but you won’t get much usable, interesting data out of it.

2. Statistics

Statistics are where the meat lives, but you have to drill down to get to them – you can’t just demand all your statistics at once because this will happen

So, use Get-CWMetrics to extract what’s available, then use Get-CWMetricStatistics to drill down.

Let’s imagine for instance, we want to find out what one of my ASGs has been doing in terms of CPU for the last, say, week. Here’s what we do

$asgName = Get-ASAutoscalingGroup | `
            ? {$_.AutoscalingGroupName -like "p-desktop*"} | `
            select -expand Autoscalinggroupname

Get-CWMetricStatistics -MetricName CPUUtilization `
                    -Dimension @{Name = "AutoScalingGroupName"; Value = "$asgName"} `
                    -StartTime (Get-Date).AddDays(-7) `
                    -EndTime (Get-Date) `
                    -Namespace "AWS/EC2" `
                    -Period 1200 `
                    -Statistic Average

What does this give you? Well, it gives you an Array of DataPoints, taken every 20 minutes (1200 seconds). datapoints are  another important concept. A DataPoint, as you’d expect, is a measurement of a specific thing at a specific time, and each set of statistics includes a whole load of datapoints. So let’s modify our code slightly

$asgName = Get-ASAutoscalingGroup | `
            ? {$_.AutoscalingGroupName -like "p-desktop*"} | `
            select -expand Autoscalinggroupname

Get-CWMetricStatistics -MetricName CPUUtilization `
                    -Dimension @{Name = "AutoScalingGroupName"; Value = "$asgName"} `
                    -StartTime (Get-Date).AddDays(-7) `
                    -EndTime (Get-Date) `
                    -Namespace "AWS/EC2" `
                    -Period 1200 `
                    -Statistic Average | `
                    Select-Object -ExpandProperty DataPoints

Now, what does this give us? Something far more interesting. It gives us the individual data points in a big long list. And this is all very well. It’ll certainly give us an idea of what’s been happening, in a granular sense, over the last week. Can we make it more fine-grained?

No, we can’t

Get-CWMetricStatistics -MetricName CPUUtilization `
                    -Dimension @{Name = "AutoScalingGroupName"; Value = "$asgName"} `
                    -StartTime (Get-Date).AddDays(-7) `
                    -EndTime (Get-Date) `
                    -Namespace "AWS/EC2" `
                    -Period 60 `
                    -Statistic Average | `
                    Select-Object -ExpandProperty DataPoints
Get-CWMetricStatistics : You have requested up to 10,080 datapoints, which exceeds the limit 
of 1,440. You may reduce the datapoints requested by increasing Period, or decreasing the 
time range.
At line:5 char:1
+ Get-CWMetricStatistics -MetricName CPUUtilization `
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (Amazon.PowerShe...tatisticsCmdlet:GetCWMetri 
   cStatisticsCmdlet) [Get-CWMetricStatistics], InvalidOperationException
    + FullyQualifiedErrorId : Amazon.CloudWatch.Model.InvalidParameterCombinationException,Am 
   azon.PowerShell.Cmdlets.CW.GetCWMetricStatisticsCmdlet

There’s a limit to what you can extract with the API. You can go quite a long way back, but your period has to be larger, so you can fit the 1440 datapoints into your request.

Protip: there are 1440 minutes in a day. You can check a metric for every minute of a day with one of these calls, and dump it in a database for later retrieval. And you can vary the StartTime and EndTime to get more data out per request. The world is your oyster here.

Anyway, let’s tie a couple of things together. I want to know what ALL my ASGs are doing in terms of CPU over a week. For that purpose, 20 minutes is fine. So here’s what I’m doing

# get average CPU through the day for each ASG
Function Get-AverageASGCPU
{
    $statsTable = @()

    Get-ASAutoscalingGroup | ? {$_.AutoscalingGroupName -like "production-*"} | % {

        # $metrics = Get-CWMetrics -Namespace "AWS/EC2"  -Dimension $filter
        $asgName = $_.AutoScalingGroupName 
        $asgMin = $_.MinSize
        $asgMax = $_.MaxSize
        $firstInstance = $_.Instances[0].InstanceId

        if($asgMin -eq 0)
        {
            $instanceType = ""
        }
        else
        {
            $instanceType = (Get-EC2Instance -Instance $firstInstance `
                                     | select -expand RunningInstance).InstanceType
        }

        $statistics = Get-CWMetricStatistics -MetricName CPUUtilization `
                                                -Dimension @{Name = "AutoScalingGroupName"; 
                                                      Value = "$asgName"} `
                                                -StartTime (Get-Date).AddDays(-7) `
                                                -EndTime (Get-Date) `
                                                -Namespace "AWS/EC2" `
                                                -Period 1200 `
                                                -Statistic Average

        $ave = $statistics.Datapoints | sort TimeStamp `
                                      | select -expand Average `
                                      |  measure -Average `
                                      | select -expand Average 

        $statsObject = [PSCustomObject]@{"AutoScalingGroupName" = $asgName; 
                                            "AverageCPU" = $ave; 
                                            "MinSize" = $asgMin; 
                                            "MaxSize" = $asgMax; 
                                            "InstanceType" = $instanceType }
        $statsTable += $statsObject
    }

    return $statsTable
}

And this returns me an object, which I can pipe to Format-Table. Or Format-Gridview, or Convert to a CSV and email to my management layer, or convert into JSON and throw into a NOSQL database for later crunching.

And this is just the beginning. In a later post, I’ll delve into some more detailed work, as well as using and discovering custom metrics.

Stay tuned!

 

One reply

  1. aws training says:

    Wow. That is so elegant and logical and clearly explained. Brilliantly goes through what could be a complex process and makes it obvious.

Leave a Reply to aws training Cancel reply

Your email address will not be published. Required fields are marked *