Closed
Bug 1357875
Opened 8 years ago
Closed 8 years ago
Add `topline_dashboard` to python_etl
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: amiyaguchi, Assigned: amiyaguchi)
References
Details
Attachments
(2 files)
`topline_dashboard` is reformats the Topline Summary view to accommodate the topline/executive report. This does the following things:
1. Marginalize the dataset to a limited set of countries and Other/ROW
2. Compute `ALL` rows
3. Collect and upload the csv to the dashboard view
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 1•8 years ago
|
||
This script will replace both run.sh and v4_reformat.py in the original reporting pipeline. The general approach will be to take the union of historical data and reformatted topline_summary data. For reference, run.sh appends this week/month's data to the end the csv file.
I plan to put 'v4-monthly.csv' and 'v4-weekly.csv' in a new, read-only location, such as `net-mozaws-prod-us-west-2-pipeline-analysis/topline/historical`. Are there any restrictions on the raw data that would prevent it from being put here?
Alternatively, this data could live next to the new data under a v0 tag if it were imported to telemetry-parquet.
Flags: needinfo?(mreid)
Updated•8 years ago
|
Component: Metrics: Pipeline → Datasets: General
Product: Cloud Services → Data Platform and Tools
Comment 2•8 years ago
|
||
There's nothing in the data that would prevent it from being put into the 'analysis' bucket you mentioned.
Flags: needinfo?(mreid)
Assignee | ||
Comment 3•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Points: 1 → 2
Assignee | ||
Comment 4•8 years ago
|
||
This has been merged into python_mozetl.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•8 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 5•8 years ago
|
||
Assignee | ||
Comment 6•8 years ago
|
||
The `file://` protocol doesn't work on EMR, due to permission issues. The workaround it to collect the dataframe and write directly to disk. For some unknown reason, this causes tests to fail, despite functionally being the same thing. In practice, the above patch sucessfully collects the dataframe and uploads it.
Assignee | ||
Updated•8 years ago
|
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Component: Datasets: General → General
You need to log in
before you can comment on or make changes to this bug.
Description
•