Closed Bug 1445613 Opened 8 years ago Closed 8 years ago

WTMO not scheduling jobs at specified times

Categories

(Data Platform and Tools Graveyard :: Operations, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jason, Assigned: hwoo)

References

Details

Attachments

(1 file)

I noticed today that one of my dags was still running and upon further investigation it started at 08:16 UTC instead of the scheduled 03:00 UTC. Although the job looks like it is marked in a failed state the logs does not mention any failures. https://workflow.telemetry.mozilla.org/admin/airflow/log?task_id=amo_create_job_flow&dag_id=mango_log_processing&execution_date=2018-03-13T03:00:00 This looks like it also affecting other dags, including main_summary. I know that there was a code deployment yesterday, maybe something changed here?
See Also: → 1445110
It looks like the tasks are being scheduled but we don't have enough celery workers to service the tasks in a reasonable time so they stay in the queue until a celery worker is available. We were probably hovering around the limit and adding the additional backfill job may have used up all the workers. It is currently configured to 16, we could probably safely double, or triple the number of workers. We may also want to bump the instance size.
Changes have been committed to master and pushed to staging. All we need to do later is find a good time to push to prod. https://ops-master.jenkinsv2.prod.mozaws.net/job/pipelines/job/wtmo/108/console
Assignee: nobody → hwoo
Priority: -- → P2
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: