Closed Bug 1378460 Opened 8 years ago Closed 8 years ago

Launch jobs directly from python_mozetl without a run script

Categories

(Data Platform and Tools :: General, enhancement)

x86
macOS
enhancement
Not set
normal
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Unassigned)

References

Details

python_mozetl is a python library that contains a large number of pyspark ETL jobs. These jobs can be run using spark-submit. An example of a current run script as of the time of writing can be found at [1]. The airflow EMR operator can chain jobs together and could possibly submit a job directly from the python_mozetl module instead of generating a run script. This reduces the overhead of adding a new job to airflow and seems to be a more idiomatic pattern. If python_mozetl is not bootstrapped on all of the provisioned EMR machines, the egg/wheel will need to be built and distributed across the cluster. [1] https://github.com/mozilla/telemetry-airflow/blob/master/jobs/topline_dashboard.sh
Depends on: 1325393
Points: --- → 3
This has been addressed with the creation of a mozetl-submit script.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Scheduling → General
You need to log in before you can comment on or make changes to this bug.