Closed
Bug 1290148
Opened 9 years ago
Closed 8 years ago
Opening a second notebook should not freeze Jupyter
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: rvitillo, Unassigned)
References
Details
(Whiteboard: [SvcOps])
User Story
Opening more than one notebook is a recipe for disaster at the moment. We should either disable the functionality entirely or make sure that all notebooks can share the Spark cluster. Dynamic Resource Allocation [1] might solve this issue. [1] https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
Attachments
(2 files)
No description provided.
Reporter | ||
Updated•9 years ago
|
Whiteboard: [SvcOps]
Updated•9 years ago
|
Points: --- → 3
Priority: -- → P3
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
I have made progress with editing spark.dynamicAllocation.executorIdleTimeout (default 60s) to 5s. I've been able to open three notebooks and run very simple test code on each of the notebooks without a large delay between notebooks. I've also tested using the spark-shell. The spark allocator has been able to allocate an executor much earlier to notebooks & shells (applications/jobs in general?) that are waiting for executors after a job has finished (this is the delay that is seen). I'm not proposing this as a solution, as I stumbled upon the setting in the following video (https://www.youtube.com/watch?v=oqWDeC1zmQw). The video outlines that setting a lower executorIdleTimeout may prove detrimental to the map and reduce phases of spark jobs.
Reporter | ||
Comment 2•9 years ago
|
||
It would be great if you could run some benchmarks using some real-world notebooks, like the ones in [1]. Please make sure to write any data to our test bucket though (s3://telemetry-test-bucket).
[1] https://github.com/mozilla-services/data-pipeline/tree/master/reports
I attempted to run a notebook on android-addons, but had major problems with respect to tasks being skipped (skipping nearly all of the job). Unfortunately, I didn't save any of the logs from YARN. I only saved webpages & pdfs from the spark monitoring page. I'll save the logs next time and document any findings I see with other settings.
Reporter | ||
Comment 4•9 years ago
|
||
(In reply to cameres from comment #3)
> I attempted to run a notebook on android-addons, but had major problems with
> respect to tasks being skipped (skipping nearly all of the job).
> Unfortunately, I didn't save any of the logs from YARN. I only saved
> webpages & pdfs from the spark monitoring page. I'll save the logs next time
> and document any findings I see with other settings.
I am not sure I understand precisely what you mean by major problems as skipping is not necessarily a bad thing. It merely means there were stage dependencies that might have been computed but which were skipped because their output was already available.
I've had some time to further investigate this issue. My confusion with respect to the skipped jobs is due to the jupyter-spark extension returning statistics from the most recently running application no matter how many applications are running(from what I understand). I must have been looking at the statistics for another notebook. I've attached a screenshot of the issue that I see. I was able to run both android-addons.ipynb & a notebook analyzing crash statistics at the same time successfully with the following settings without any lagging issues between notebooks, although jupyter-spark issues still exists :
- removing spark.executors.instances
- I believe this setting may conflict with dynamic allocation, as requesting 16 executors for an application would result in potentially delaying a job if there are multiple applications (http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/)
- spark should scale the number of executors to a maximum if there isn't another application running quite rapidly (loading-executors.png)
- spark.dynamicAllocation.executorIdleTimeout 5s
- spark will deallocate executors from an application after 5 seconds of idle time
- since YARN on EMR runs it's own external shuffle service, lowering from the default of 60s should not be an issue
screenshot of dynamically allocating 16 executors to a lone spark job
Reporter | ||
Comment 8•9 years ago
|
||
Jannis, can the jupyter-spark extension be easily adapted to support dynamic resource allocation?
Flags: needinfo?(jezdez)
Reporter | ||
Comment 9•9 years ago
|
||
Mark, as the project's sponsor maybe you are in a better position to answer that question.
Flags: needinfo?(jezdez) → needinfo?(mreid)
Comment 10•9 years ago
|
||
I'm not sure how easy it will be - IIRC the code assumes that only one notebook / kernel is running at a time. I would guess it will involve a fair bit of refactoring.
Flags: needinfo?(mreid)
Comment 11•8 years ago
|
||
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•