1318842 - Automatically populate the "Further Reading:" section on the ATMO cluster startup page with more analytical resources

Reporter

Description

•

9 years ago

Currently a large number of notebooks exist for working with telemetry data, but they are not located in a central share of any kind. As a result, simple one-off analyses require a lot of redundant code for reoccurring tasks such as: data selection, filtering, cleaning, sorting, etc. Or a lot of asking around to find out if someone else has performed a similar analysis. The page displayed after launching a new ad-hoc spark cluster links us to rvitillo's awesome notebook with following: "For a guide of how to use this shiny new cluster, check out :rvitillo's blog post on the topic." This spot would make an ideal place to link additional already existing, well documented notebooks that aim to accomplish specific tasks. The list could be automatically populated from a designated Github repo for this purpose. This would allow users to share notebooks intended for specific analytical tasks and decrease the overhead of getting an actionable subset of the data suitable for a specific type of analysis.

Georg Fritzsche [:gfritzsche]

Comment 1

•

9 years ago

We could also consider adding more default notebooks to the cluster that cover some common tasks. Pulling these from a visibly linked GitHub repo would make it easy for users to submit quick PRs with fixes.

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Blocks: 1248688

Jannis Leidel [:jezdez]

Comment 2

•

9 years ago

Sounds like a great idea, and I agree this should live in a place where we can easily submit and review changes to the notebooks -- they are code after all and should go through the same kind of code review process like any other code. I've done some digging and it seems the current notebook format does not support storing metadata about author, title etc yet, but will once nbformat 4.2 is out: https://github.com/jupyter/nbformat/compare/4.1.0...1ab21fac9dc4dd5d8eb1aacbc4359481da001b38#diff-b4de71ae1a0a055b38eab43f1ac9876c Which is about to be released any day now. There is some discussion (and references to older discussion about this) here: https://github.com/jupyter/nbformat/issues/45 I suggest we use the following structure to store the notebooks *inside the atmo* repo (to have the content data and content display as one and ease code review): notebooks/ ├── fxa-dau-2016-v1.ipynb └── ... In the Notebooks we can add the metadata, e.g.: .. "metadata": { "title": "Test Notebook", "authors": [{"name": "Jean Tester"}] }, .. If the metadata is missing, we could just leave it out and display the name of the notebook file.

Roberto Agostino Vitillo (:rvitillo)

Comment 3

•

9 years ago

As part of the 2017 plan we wanted to have a look at tools like [1], which don't support just Jupyter notebooks. [1] https://github.com/airbnb/knowledge-repo

Jannis Leidel [:jezdez]

Comment 4

•

9 years ago

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3) > As part of the 2017 plan we wanted to have a look at tools like [1], which > don't support just Jupyter notebooks. > > [1] https://github.com/airbnb/knowledge-repo Okay, so this should *not* be in ATMO?

Roberto Agostino Vitillo (:rvitillo)

Comment 5

•

9 years ago

We haven't evaluated those tools yet so I would prefer to wait for that to happen first before we implement this functionality in ATMO.

Jannis Leidel [:jezdez]

Comment 6

•

9 years ago

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #5) > We haven't evaluated those tools yet so I would prefer to wait for that to > happen first before we implement this functionality in ATMO. Understood

Thomas Huelbert

Updated

•

9 years ago

Points: --- → 2

Priority: -- → P2

Roberto Agostino Vitillo (:rvitillo)

Comment 7

•

9 years ago

See https://mail.mozilla.org/pipermail/fhr-dev/2016-November/001088.html.

Jannis Leidel [:jezdez]

Comment 8

•

9 years ago

Moved to https://github.com/mozilla/telemetry-analysis-service/issues/210

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → INVALID

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

Bugzilla

Automatically populate the "Further Reading:" section on the ATMO cluster startup page with more analytical resources

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

Tracking

(Not tracked)

People

(Reporter: mlopatka, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated