Closed
Bug 1256413
Opened 10 years ago
Closed 10 years ago
Increase n_workers to 200 in analysis-service/server.py
Categories
(Webtools Graveyard :: Telemetry Server, defect)
Webtools Graveyard
Telemetry Server
Tracking
(firefox48 affected)
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox48 | --- | affected |
People
(Reporter: jjensen, Unassigned)
References
Details
Hi Roberto,
Many of the projects we are working on require scanning the entire dataset. The limit of 20 nodes is too small to conduct this work in a timely fashion. Please increase this to 200.
Thanks
https://github.com/mozilla/telemetry-server/commit/d940e6eec8fc52d73aa6f3fea69628f4aabca887
![]() |
||
Comment 2•10 years ago
|
||
I would like to understand what sort of analyses need to run on the entire dataset and if we can solve it in some other way (derived datasets?).
Flags: needinfo?(rvitillo) → needinfo?(jjensen)
![]() |
Reporter | |
Comment 3•10 years ago
|
||
Hi Roberto,
Dave Zeber is currently working on something for Business Development that requires this type of analysis. It took him a significant amount of effort and hassle to complete a first run spinning up multiple sets of 20 clusters, watching for 24-hour overruns, etc. He can fill you in an offline discussion if you'd like.
I agree that these types of questions, and many others, should be answered by derived datasets, but a) they will never address all needs -- ad hoc requests will happen b) it has been months since the v4 launch and we still don't have many important derived datasets and c) we need this data to move our organization forward ASAP.
John
Flags: needinfo?(jjensen)
Comment 4•10 years ago
|
||
Since search data is crucial to our revenue stream it is important that we are able to work with the complete data in a convenient way (eg. computing search volume over some grouping).
This should definitely be done with derived datasets, and we are working on getting some set up. However, as John mentions, there will always be cases where we need to work with the full raw data, and we need to have the ability/permissions to do it without too much hassle. If there's an issue around permissions, maybe we could set something up on a per-user or per-team basis.
![]() |
||
Comment 5•10 years ago
|
||
I agree that we need to have a way to deal with emergencies. My concern is that by choosing to increase that limit we might choose to become "lazy" in a bad way, i.e. writing slow inefficient jobs in Python that are extremely expensive (4k$ a day per 200-node cluster) instead of trying to improve performance by other means. I would be OK to increase that limit once we have some way to make teams and users accountable through monitoring.
It would be helpful to know what kind of non ad-hoc jobs you were running on the unsampled raw data back in the FHR days to know which derived datasets are still missing.
![]() |
Reporter | |
Comment 6•10 years ago
|
||
Hi Roberto,
A few points about this issue.
I had a discussion with Benjamin about this yesterday. He indicated that it was not possible to increase the limit because there was no ability to include any accounting, and that there was no plan to add it in the short term.
To that end here's a pull request that adds the @mozilla.com email ID of any job created to its name. AWS's reporting could easily be used to identify owners, and thus costs, of the resulting jobs.
https://github.com/mozilla/telemetry-server/pull/150 . It's (obviously) untested, but perhaps it could be of use when or if this chosen to be a priority.
I'm disappointed that we seem to be trapped in something of vicious circle: more than four months after the launch to release users of UT, there are still no useful search-related datasets, and the individual with the ability to allow us to create these datasets or to run replacement ad hoc analysis jobs also refuses to make a one-line change to enable them.
While, last week, we found another way to build the derived dataset required, using 150 nodes, it is clear that I need to find a separate approach to getting around this obstacle in future. I'll do that outside this bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Updated•7 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•