Closed
Bug 1392197
Opened 8 years ago
Closed 8 years ago
Perfrom analysis to determine ideal addon donor set (for new clients)
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mlopatka, Assigned: mlopatka)
References
Details
No description provided.
Addon donors for the similarity recommender should be determined to include good coverage of the addon ecosystem at any given time.
The airflow job to perform a scrape of the current AMO database should coplete before this.
Addon donors should be selected in order to guarentee coverage of 90-95% of the total installed addons in the population while maintaining a number lower than the threshold required for speedy performance.
Diversity of addons in the donor list should be a driving factor.
Assignee: nobody → mlopatka
Additional thoughts on sampling the donor list.
Random (within strata) stratified sampling based on addons-installed would lead to a continued biased towards the top-N installed addons assuming a relationship between telemetry variables and addon installations. Since stratified sampling is most effective when 3 conditions are met:
1- high inter-strata variability (not necessarily guaranteed when strata are defined by addons installed)
2- low intra-strata variability (also not certain given our current problem space)
3- strata definitions (inclusion criteria) are highly correlated to expected outcome measurement of interest when sampling. (perhaps holds for this case, though we expect a weak correlation at best)
Therefore, I am exploring clustered sampling instead. First potential donors will be clustered based on the vector-space representation of their installed addons. then each cluster will be sampled (proportionately and randomly) until a desired number of candidate donors are selected for the similarity based recommendations.
Divisive hierarchical clustering captures the relationships that we are interested here and has a nice implementation in spark.mlib. Good results are accomplished and the addon-vector-space model lends itself nicely to this approach. Marking this as closed.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 4•8 years ago
|
||
Moved to new component, per bug 1425844.
Component: General → Add-on Recommender
Updated•3 years ago
|
Component: Add-on Recommender → General
You need to log in
before you can comment on or make changes to this bug.
Description
•