Closed
Bug 1326068
Opened 9 years ago
Closed 8 years ago
Add Datadog Docker container monitoring to Airflow ECS cluster
Categories
(Data Platform and Tools :: Monitoring & Alerting, defect, P2)
Data Platform and Tools
Monitoring & Alerting
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bugzilla, Assigned: whd)
References
Details
(Whiteboard: [SvcOps])
I've heard we have a datadog account connected to the cloud services dev AWS account, which would be super useful to get container-level metrics on the airflow ECS instance (our zombie job issues were probably due to OOM on either the worker and/or the scheduler container, but the built-in CloudWatch metrics only show task-level metrics.)
If someone with access could add me to the account I'd be happy to take the steps listed here to add monitoring and alerting:
https://www.datadoghq.com/blog/monitor-docker-on-aws-ecs/
Updated•9 years ago
|
Whiteboard: [SvcOps]
Updated•9 years ago
|
Points: --- → 1
Priority: -- → P2
Assignee | ||
Comment 1•9 years ago
|
||
I've invited :sunasuh to datadog per https://www.datadoghq.com/blog/monitor-docker-on-aws-ecs/ and gpg+emailed the dev api key.
This api key should not be stored anywhere unencrypted except on the ECS host itself running the datadog agent. I assume this is a one-off and doesn't need to be automated; if that is not the case then there is considerably more work to do here to set up proper instance provisioning logic (most imporantly, the use of SOPS so that we can pull the api key from KMS).
Flags: needinfo?(whd)
Updated•8 years ago
|
Component: Metrics: Pipeline → Monitoring & Alerting
Product: Cloud Services → Data Platform and Tools
Assignee | ||
Comment 2•8 years ago
|
||
Airflow hosts now report stats to datadog as part of bug #1336975.
Assignee: nobody → whd
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•