Closed Bug 1357875 Opened 8 years ago Closed 8 years ago

Add `topline_dashboard` to python_etl

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(2 files)

Bug 1357875 - Add script for generating topline dashboard data #18 8 years ago Anthony Miyaguchi [:amiyaguchi] 48 bytes, text/x-github-pull-request		Details \| Review
Bug 1357875 - Use collect instead of writing directly to `file://` via spark #27 8 years ago Anthony Miyaguchi [:amiyaguchi] 48 bytes, text/x-github-pull-request		Details \| Review

Anthony Miyaguchi [:amiyaguchi]

Assignee

Description

•

8 years ago

`topline_dashboard` is reformats the Topline Summary view to accommodate the topline/executive report. This does the following things: 1. Marginalize the dataset to a limited set of countries and Other/ROW 2. Compute `ALL` rows 3. Collect and upload the csv to the dashboard view

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Assignee: nobody → amiyaguchi

Blocks: 1309574

Status: NEW → ASSIGNED

Points: --- → 1

Priority: -- → P1

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Blocks: 1329844
No longer blocks: 1309574

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 1

•

8 years ago

This script will replace both run.sh and v4_reformat.py in the original reporting pipeline. The general approach will be to take the union of historical data and reformatted topline_summary data. For reference, run.sh appends this week/month's data to the end the csv file. I plan to put 'v4-monthly.csv' and 'v4-weekly.csv' in a new, read-only location, such as `net-mozaws-prod-us-west-2-pipeline-analysis/topline/historical`. Are there any restrictions on the raw data that would prevent it from being put here? Alternatively, this data could live next to the new data under a v0 tag if it were imported to telemetry-parquet.

Flags: needinfo?(mreid)

Thomas Huelbert

Updated

•

8 years ago

Component: Metrics: Pipeline → Datasets: General

Product: Cloud Services → Data Platform and Tools

Mark Reid [:mreid]

Comment 2

•

8 years ago

There's nothing in the data that would prevent it from being put into the 'analysis' bucket you mentioned.

Flags: needinfo?(mreid)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 3

•

8 years ago

Attached file Bug 1357875 - Add script for generating topline dashboard data #18 — Details

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Points: 1 → 2

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 4

•

8 years ago

This has been merged into python_mozetl.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 5

•

8 years ago

Attached file Bug 1357875 - Use collect instead of writing directly to `file://` via spark #27 — Details

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 6

•

8 years ago

The `file://` protocol doesn't work on EMR, due to permission issues. The workaround it to collect the dataframe and write directly to disk. For some unknown reason, this causes tests to fail, despite functionally being the same thing. In practice, the above patch sucessfully collects the dataframe and uploads it.

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: Datasets: General → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Add `topline_dashboard` to python_etl

Categories

(Data Platform and Tools :: General, enhancement, P1)

Tracking

(Not tracked)

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Attachment

General

Description

File Name

Content Type