Closed
Bug 1400186
Opened 8 years ago
Closed 8 years ago
TAAR: Prototype the ensemble method
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: fhartmann, Assigned: fhartmann, Mentored)
References
Details
Implement the chosen ensemble method [1] in a Jupyter notebook and evaluate how well it works. For validation and testing, new datasets need to be created
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1400184
Assignee | ||
Comment 1•8 years ago
|
||
We have decided to start by implementing and fitting a linear stacking model, as described in the document about ensemble methods [1].
As an evaluation metric, we plan to use mean average precision (MAP), a good description for this can be found in the information retrieval book from Stanford [2].
[1] https://docs.google.com/a/mozilla.com/document/d/1EAjrPn1FP_em7bLP24V9Bys-HHv5U-VauuLhzOXU4OE/edit?usp=sharing (access limited to Mozilla accounts)
[2] https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html Equations 43 and 44
Comment 2•8 years ago
|
||
Hey Florian, are you currently working on this bug? Are you planning to work on it this quarter?
Flags: needinfo?(fhartmann)
Assignee | ||
Comment 3•8 years ago
|
||
Yep, I'm currently working on this. My internship ends Friday next week, so I plan to finish it this quarter. See [1] for more details on the general project.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1400187
Flags: needinfo?(fhartmann)
Updated•8 years ago
|
Points: --- → 3
Priority: -- → P1
Assignee | ||
Comment 4•8 years ago
|
||
I finished the prototype for the ensemble model. It outperforms the previous approach by about 2% using the MAP metric. Of course the MAP metric might not be the perfect thing to optimize for, but it at least shows that the model generally works fairly well.
My internship ends here, but there are still some additional things that could be done to extend the prototype:
- More feature normalization / preprocessing
- Optimizing for a different metric
- Generally choosing a more stable optimization process
The code and notebooks for the prototype can be found on GitHub[1, 2].
The changes I made to the existing TAAR modules currently live in a branch of my TAAR fork [3].
[1] https://github.com/florian/taar-prototyping
[2] http://github.com/mozilla/addon_recommender_driving_analyses (access limited to a few contributors)
[3] https://github.com/florian/taar/tree/ensemble_changes
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 5•8 years ago
|
||
Moved to new component, per bug 1425844.
Component: General → Add-on Recommender
Updated•3 years ago
|
Component: Add-on Recommender → General
You need to log in
before you can comment on or make changes to this bug.
Description
•