Closed Bug 1370529 Opened 8 years ago Closed 6 years ago

Refactor Budget Dashboard ETL

Categories

(Data Platform and Tools :: General, enhancement, P3)

x86
macOS
enhancement
Points:
3

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mreid, Unassigned)

References

Details

Attachments

(1 file)

The Budget Dashboard[1] is powered by an ETL job that is running as a Jupyter notebook scheduled on ATMO. It currently uses a dedicated data source in heka-framed protobuf form, processes data and incrementally updates a summary file. The code should be updated to be idempotent. We should determine the best path forward for this job, which could be: - port the code to python_mozetl and add tests - migrate to telemetry-batch-view - change the data source to use direct-to-parquet output and either use SparkSQL for the job or move the dashboard into re:dash. - other possibilities? I believe the notebook has some embedded assumptions about user counts, so it may not be able to be publicly shared. I can provide the current notebook separately as needed. [1] https://metrics.services.mozilla.com/telemetry-budget-dashboard/
Points: --- → 3
Priority: -- → P2
Assignee: nobody → spenrose
Blocks: 1325390
Attached file budget_report.ipynb
I had a look and the notebook fetches the targets as metadata so the code seems fine to be public.
Assignee: spenrose → nobody
Priority: P2 → P3
Component: General → Datasets: General
I don't want to generally take over this bug, but I am taking over the d2p for payload-size messages.
direct-to-parquet output added here: https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/103 Thanks Frank!
Blocks: 1434094
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: