Closed Bug 1256413 Opened 10 years ago Closed 10 years ago

Increase n_workers to 200 in analysis-service/server.py

Tracking

(firefox48 affected)

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

firefox48

---

affected

People

(Reporter: jjensen, Unassigned)

References

Details

John Jensen

Reporter

Description

•

10 years ago

Hi Roberto, Many of the projects we are working on require scanning the entire dataset. The limit of 20 nodes is too small to conduct this work in a timely fashion. Please increase this to 200. Thanks https://github.com/mozilla/telemetry-server/commit/d940e6eec8fc52d73aa6f3fea69628f4aabca887

John Jensen

Reporter

Comment 1

•

10 years ago

needinfo'ing Roberto.

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 2

•

10 years ago

I would like to understand what sort of analyses need to run on the entire dataset and if we can solve it in some other way (derived datasets?).

Flags: needinfo?(rvitillo) → needinfo?(jjensen)

John Jensen

Reporter

Comment 3

•

10 years ago

Hi Roberto, Dave Zeber is currently working on something for Business Development that requires this type of analysis. It took him a significant amount of effort and hassle to complete a first run spinning up multiple sets of 20 clusters, watching for 24-hour overruns, etc. He can fill you in an offline discussion if you'd like. I agree that these types of questions, and many others, should be answered by derived datasets, but a) they will never address all needs -- ad hoc requests will happen b) it has been months since the v4 launch and we still don't have many important derived datasets and c) we need this data to move our organization forward ASAP. John

Flags: needinfo?(jjensen)

Dave Zeber [:dzeber]

Comment 4

•

10 years ago

Since search data is crucial to our revenue stream it is important that we are able to work with the complete data in a convenient way (eg. computing search volume over some grouping). This should definitely be done with derived datasets, and we are working on getting some set up. However, as John mentions, there will always be cases where we need to work with the full raw data, and we need to have the ability/permissions to do it without too much hassle. If there's an issue around permissions, maybe we could set something up on a per-user or per-team basis.

Roberto Agostino Vitillo (:rvitillo)

Comment 5

•

10 years ago

I agree that we need to have a way to deal with emergencies. My concern is that by choosing to increase that limit we might choose to become "lazy" in a bad way, i.e. writing slow inefficient jobs in Python that are extremely expensive (4k$ a day per 200-node cluster) instead of trying to improve performance by other means. I would be OK to increase that limit once we have some way to make teams and users accountable through monitoring. It would be helpful to know what kind of non ad-hoc jobs you were running on the unsampled raw data back in the FHR days to know which derived datasets are still missing.

Roberto Agostino Vitillo (:rvitillo)

Updated

•

10 years ago

Depends on: 1248688

John Jensen

Reporter

Comment 6

•

10 years ago

Hi Roberto, A few points about this issue. I had a discussion with Benjamin about this yesterday. He indicated that it was not possible to increase the limit because there was no ability to include any accounting, and that there was no plan to add it in the short term. To that end here's a pull request that adds the @mozilla.com email ID of any job created to its name. AWS's reporting could easily be used to identify owners, and thus costs, of the resulting jobs. https://github.com/mozilla/telemetry-server/pull/150 . It's (obviously) untested, but perhaps it could be of use when or if this chosen to be a priority. I'm disappointed that we seem to be trapped in something of vicious circle: more than four months after the launch to release users of UT, there are still no useful search-related datasets, and the individual with the ability to allow us to create these datasets or to run replacement ad hoc analysis jobs also refuses to make a one-line change to enable them. While, last week, we found another way to build the derived dataset required, using 150 nodes, it is clear that I need to find a separate approach to getting around this obstacle in future. I'll do that outside this bug.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

7 years ago

Product: Webtools → Webtools Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Increase n_workers to 200 in analysis-service/server.py

Categories

(Webtools Graveyard :: Telemetry Server, defect)

Tracking

(firefox48 affected)

People

(Reporter: jjensen, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Updated