Closed
Bug 1378460
Opened 8 years ago
Closed 8 years ago
Launch jobs directly from python_mozetl without a run script
Categories
(Data Platform and Tools :: General, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: amiyaguchi, Unassigned)
References
Details
python_mozetl is a python library that contains a large number of pyspark ETL jobs. These jobs can be run using spark-submit. An example of a current run script as of the time of writing can be found at [1].
The airflow EMR operator can chain jobs together and could possibly submit a job directly from the python_mozetl module instead of generating a run script. This reduces the overhead of adding a new job to airflow and seems to be a more idiomatic pattern.
If python_mozetl is not bootstrapped on all of the provisioned EMR machines, the egg/wheel will need to be built and distributed across the cluster.
[1] https://github.com/mozilla/telemetry-airflow/blob/master/jobs/topline_dashboard.sh
Reporter | ||
Updated•8 years ago
|
Points: --- → 3
Reporter | ||
Comment 1•8 years ago
|
||
This has been addressed with the creation of a mozetl-submit script.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•3 years ago
|
Component: Scheduling → General
You need to log in
before you can comment on or make changes to this bug.
Description
•