Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimise resource utilisation, and get a unified view of operational health.
You can use CloudWatch to detect anomalous behaviour in your AWS cloud environments, set alarms, visualise logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications
Why Would you need this service?
Cloud monitoring makes it easier to identify patterns and discover potential security risks in the infrastructure. Some key capabilities of cloud monitoring include: .
- Ability to monitor large volumes of cloud data across many distributed locations;
- Gain visibility into application, user, and file behaviour to identify potential attacks or compromises;
- Continuous monitoring to ensure new and modified files are scanned in real time;
- Auditing and reporting capabilities to manage security compliance;
- Integrating monitoring tools with a range of cloud service providers.
How we deliver this service
CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, and visualizes it using automated dashboards so you can get a unified view of your AWS resources, applications, and services that run in AWS and on-premises.
You can correlate your metrics and logs to better understand the health and performance of your resources. You can also create alarms based on metric value thresholds you specify, or that can watch for anomalous metric behaviour based on machine learning algorithms.
To take action quickly, you can set up automated actions to notify you if an alarm is triggered and automatically start auto scaling, for example, to help reduce mean-time-to-resolution. You can also dive deep and analyse your metrics, logs, and traces, to better understand how to improve application performance.
- Infrastructure monitoring and troubleshooting – Monitor key metrics and logs, visualise your application and infrastructure stack, create alarms, and correlate metrics and logs to understand and resolve root cause of performance issues in your AWS resources. This includes monitoring your container ecosystem across Amazon ECS, AWS Fargate, Amazon EKS, and Kubernetes.
- Mean-time-to-resolution improvement – CloudWatch helps you correlate, visualise, and analyse metrics and logs, so you can act quickly to resolve issues, and combine them with trace data from AWS X-Ray for end-to-end observability. You can also analyse user requests to help speed up troubleshooting and debugging and reduce overall mean-time-to-resolution (MTTR).
- Proactive resource optimisation – CloudWatch alarms watch your metric values against thresholds that either you specify, or that CloudWatch creates for you using machine learning models to detect anomalous behaviour. If an alarm is triggered, CloudWatch can take action automatically to enable Amazon EC2 Auto Scaling or stop an instance, for example, so you can automate capacity and resource planning.
- Application monitoring – Monitor your applications that run on AWS (on Amazon EC2, containers, and serverless) or on-premises. CloudWatch collects data at every layer of the performance stack, including metrics and logs on automatic dashboards.
- Log analytics – Explore, analyse, and visualise your logs to address operational issues and improve applications performance. You can perform queries to help you quickly and effectively respond to operational issues. If an issue occurs, you can start querying immediately using a purpose-built query language to rapidly identify potential causes.
Benefits/ Typical Outcomes
- Observability on a single platform across applications and infrastructure – Modern applications such as those running on microservices architectures generate large volumes of data in the form of metrics, logs, and events. Amazon CloudWatch enables you to collect, access, and correlate this data on a single platform from across all your AWS resources, applications, and services that run on AWS and on-premises servers, helping you break down data silos so you can easily gain system-wide visibility and quickly resolve issues.
- Easiest way to collect metrics in AWS and on-premises – Monitoring your AWS resources and applications is easy with CloudWatch. It natively integrates with more than 70 AWS services such as Amazon EC2, Amazon DynamoDB, Amazon S3, Amazon ECS, Amazon EKS, and AWS Lambda, and automatically publishes detailed 1-minute metrics and custom metrics with up to 1-second granularity so you can dive deep into your logs for additional context. You can also use CloudWatch in hybrid cloud architectures by using the CloudWatch Agent or API to monitor your on-premises resources.
- Improve operational performance and resource optimization – Amazon CloudWatch enables you to set alarms and automate actions based on either predefined thresholds, or on machine learning algorithms that identify anomalous behaviour in your metrics. For example, it can start Amazon EC2 Auto Scaling automatically, or stop an instance to reduce billing overages. You can also use CloudWatch Events for serverless to trigger workflows with services like AWS Lambda, Amazon SNS, and AWS CloudFormation.
- Get operational visibility and insight – To optimise performance and resource utilisation, you need a unified operational view, real-time granular data, and historical reference. CloudWatch provides automatic dashboards, data with 1-second granularity, and up to 15 months of metrics storage and retention. You can also perform metric maths on your data to derive operational and utilisation insights; for example, you can aggregate usage across an entire fleet of EC2 instances.
- Derive actionable insights from logs – CloudWatch enables you to explore, analyse, and visualise your logs so you can troubleshoot operational problems with ease. With CloudWatch Logs Insights, you only pay for the queries you run. It scales with your log volume and query complexity giving you answers in seconds. In addition, you can publish log-based metrics, create alarms, and correlate logs and metrics together in CloudWatch Dashboards for complete operational visibility.