Closed
Bug 1318842
Opened 9 years ago
Closed 9 years ago
Automatically populate the "Further Reading:" section on the ATMO cluster startup page with more analytical resources
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: mlopatka, Unassigned)
References
Details
Currently a large number of notebooks exist for working with telemetry data, but they are not located in a central share of any kind. As a result, simple one-off analyses require a lot of redundant code for reoccurring tasks such as: data selection, filtering, cleaning, sorting, etc. Or a lot of asking around to find out if someone else has performed a similar analysis.
The page displayed after launching a new ad-hoc spark cluster links us to rvitillo's awesome notebook with following:
"For a guide of how to use this shiny new cluster, check out :rvitillo's blog post on the topic."
This spot would make an ideal place to link additional already existing, well documented notebooks that aim to accomplish specific tasks. The list could be automatically populated from a designated Github repo for this purpose.
This would allow users to share notebooks intended for specific analytical tasks and decrease the overhead of getting an actionable subset of the data suitable for a specific type of analysis.
Comment 1•9 years ago
|
||
We could also consider adding more default notebooks to the cluster that cover some common tasks.
Pulling these from a visibly linked GitHub repo would make it easy for users to submit quick PRs with fixes.
Comment 2•9 years ago
|
||
Sounds like a great idea, and I agree this should live in a place where we can easily submit and review changes to the notebooks -- they are code after all and should go through the same kind of code review process like any other code.
I've done some digging and it seems the current notebook format does not support storing metadata about author, title etc yet, but will once nbformat 4.2 is out: https://github.com/jupyter/nbformat/compare/4.1.0...1ab21fac9dc4dd5d8eb1aacbc4359481da001b38#diff-b4de71ae1a0a055b38eab43f1ac9876c Which is about to be released any day now.
There is some discussion (and references to older discussion about this) here: https://github.com/jupyter/nbformat/issues/45
I suggest we use the following structure to store the notebooks *inside the atmo* repo (to have the content data and content display as one and ease code review):
notebooks/
├── fxa-dau-2016-v1.ipynb
└── ...
In the Notebooks we can add the metadata, e.g.:
..
"metadata": {
"title": "Test Notebook",
"authors": [{"name": "Jean Tester"}]
},
..
If the metadata is missing, we could just leave it out and display the name of the notebook file.
Comment 3•9 years ago
|
||
As part of the 2017 plan we wanted to have a look at tools like [1], which don't support just Jupyter notebooks.
[1] https://github.com/airbnb/knowledge-repo
Comment 4•9 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3)
> As part of the 2017 plan we wanted to have a look at tools like [1], which
> don't support just Jupyter notebooks.
>
> [1] https://github.com/airbnb/knowledge-repo
Okay, so this should *not* be in ATMO?
Comment 5•9 years ago
|
||
We haven't evaluated those tools yet so I would prefer to wait for that to happen first before we implement this functionality in ATMO.
Comment 6•9 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #5)
> We haven't evaluated those tools yet so I would prefer to wait for that to
> happen first before we implement this functionality in ATMO.
Understood
Updated•9 years ago
|
Points: --- → 2
Priority: -- → P2
Comment 7•9 years ago
|
||
Comment 8•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•