Closed Bug 1290148 Opened 9 years ago Closed 8 years ago

Opening a second notebook should not freeze Jupyter

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

People

(Reporter: rvitillo, Unassigned)

References

Details

(Whiteboard: [SvcOps])

User Story

Opening more than one notebook is a recipe for disaster at the moment. We should either disable the functionality entirely or make sure that all notebooks can share the Spark cluster.

Dynamic Resource Allocation [1] might solve this issue.

[1] https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

Attachments

(2 files)

issue w/ jupyter-spark & two notebooks 9 years ago cameres 829.97 KB, image/png		Details
loading-executors.png 9 years ago cameres 191.04 KB, image/png		Details

Roberto Agostino Vitillo (:rvitillo)

Reporter

Description

•

9 years ago

No description provided.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

9 years ago

Whiteboard: [SvcOps]

Thomas Huelbert

Updated

•

9 years ago

Points: --- → 3

Priority: -- → P3

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

9 years ago

User Story: (updated)

cameres

Comment 1

•

9 years ago

I have made progress with editing spark.dynamicAllocation.executorIdleTimeout (default 60s) to 5s. I've been able to open three notebooks and run very simple test code on each of the notebooks without a large delay between notebooks. I've also tested using the spark-shell. The spark allocator has been able to allocate an executor much earlier to notebooks & shells (applications/jobs in general?) that are waiting for executors after a job has finished (this is the delay that is seen). I'm not proposing this as a solution, as I stumbled upon the setting in the following video (https://www.youtube.com/watch?v=oqWDeC1zmQw). The video outlines that setting a lower executorIdleTimeout may prove detrimental to the map and reduce phases of spark jobs.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 2

•

9 years ago

It would be great if you could run some benchmarks using some real-world notebooks, like the ones in [1]. Please make sure to write any data to our test bucket though (s3://telemetry-test-bucket). [1] https://github.com/mozilla-services/data-pipeline/tree/master/reports

cameres

Comment 3

•

9 years ago

I attempted to run a notebook on android-addons, but had major problems with respect to tasks being skipped (skipping nearly all of the job). Unfortunately, I didn't save any of the logs from YARN. I only saved webpages & pdfs from the spark monitoring page. I'll save the logs next time and document any findings I see with other settings.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 4

•

9 years ago

(In reply to cameres from comment #3) > I attempted to run a notebook on android-addons, but had major problems with > respect to tasks being skipped (skipping nearly all of the job). > Unfortunately, I didn't save any of the logs from YARN. I only saved > webpages & pdfs from the spark monitoring page. I'll save the logs next time > and document any findings I see with other settings. I am not sure I understand precisely what you mean by major problems as skipping is not necessarily a bad thing. It merely means there were stage dependencies that might have been computed but which were skipped because their output was already available.

cameres

Comment 5

•

9 years ago

I've had some time to further investigate this issue. My confusion with respect to the skipped jobs is due to the jupyter-spark extension returning statistics from the most recently running application no matter how many applications are running(from what I understand). I must have been looking at the statistics for another notebook. I've attached a screenshot of the issue that I see. I was able to run both android-addons.ipynb & a notebook analyzing crash statistics at the same time successfully with the following settings without any lagging issues between notebooks, although jupyter-spark issues still exists : - removing spark.executors.instances - I believe this setting may conflict with dynamic allocation, as requesting 16 executors for an application would result in potentially delaying a job if there are multiple applications (http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/) - spark should scale the number of executors to a maximum if there isn't another application running quite rapidly (loading-executors.png) - spark.dynamicAllocation.executorIdleTimeout 5s - spark will deallocate executors from an application after 5 seconds of idle time - since YARN on EMR runs it's own external shuffle service, lowering from the default of 60s should not be an issue

cameres

Comment 6

•

9 years ago

Attached image issue w/ jupyter-spark & two notebooks — Details

cameres

Comment 7

•

9 years ago

Attached image loading-executors.png — Details

screenshot of dynamically allocating 16 executors to a lone spark job

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 8

•

9 years ago

Jannis, can the jupyter-spark extension be easily adapted to support dynamic resource allocation?

Flags: needinfo?(jezdez)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 9

•

9 years ago

Mark, as the project's sponsor maybe you are in a better position to answer that question.

Flags: needinfo?(jezdez) → needinfo?(mreid)

Mark Reid [:mreid]

Comment 10

•

9 years ago

I'm not sure how easy it will be - IIRC the code assumes that only one notebook / kernel is running at a time. I would guess it will involve a fair bit of refactoring.

Flags: needinfo?(mreid)

Firefox Bug Husbandry Bot

Comment 11

•

8 years ago

Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → INCOMPLETE

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Opening a second notebook should not freeze Jupyter

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Unassigned)

References

Details

(Whiteboard: [SvcOps])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type