1386629 - Perform analysis of recommendation strategies for legacy addon replacement recommendation

mlopatka

Assignee

Description

•

8 years ago

No description provided.

mlopatka

Assignee

Updated

•

8 years ago

Assignee: nobody → mlopatka

mlopatka

Assignee

Comment 1

•

8 years ago

Using features from AMO database exported to a local JSON blob: Similarity is computed between vectors containing: ['guid', 'legacy', 'ratings', 'installs', 'languages', 'summary', 'tags', 'title', 'categories'] New addons recommended based on similarity scoring per variable modified hamming distance for: 'languages', 'tags', 'categories' cosine similarity in TF/IDF space for: 'summary', 'title' Similarities comboned by weighted mean. Current dump (August 3, 2017) shows counts as: 16247 legacy addons, 3520 web extensions prototype is available here: https://gist.github.com/mlopatka/ac2f98b33229ec126f2c8930ffb9f126#file-gen_legacy_addon_substitution_suggestions-py

mlopatka

Assignee

Comment 2

•

8 years ago

Current recommendations are unsatisfactory. Weight vector for combining similarity scores must be optimized against human feedback. I plan to implement a (less greedy) hill-climber via the simulated annealing algoirthm: https://en.wikipedia.org/wiki/Simulated_annealing With aggressive reinforcement I think I can perform a semi-supervised training for the weight vector.

mlopatka

Assignee

Comment 3

•

8 years ago

Language based features are now refined using better stripping and cleaning of the reference vocabulary for TF/IDF. Recommendations are still not satisfactory. I've introduced a dummy variable to weight the features when computing scalar similarity for ranking, this will be varied by the simulated annealing algorithm in the future. repo: https://github.com/mozilla/addon_recommender_driving_analyses/tree/master/legacy_swap

mlopatka

Assignee

Comment 4

•

8 years ago

@Dexter is it possible (very difficult?) to get the 'category' field from the AMO database included in this featurespace? The TFIDF model suffers when comparing summaries/descriptions of a very different length. So, I can go the direction of doing more aggressive preprocessing on the text (probably pretty expensive) or introduce an additional text feature with more bounded vocabulary (i.e. categories).

Flags: needinfo?(aplacitelli)

Alessio Placitelli [:Dexter]

Comment 5

•

8 years ago

(In reply to mlopatka from comment #4) > @Dexter is it possible (very difficult?) to get the 'category' field from > the AMO database included in this featurespace? > The TFIDF model suffers when comparing summaries/descriptions of a very > different length. So, I can go the direction of doing more aggressive > preprocessing on the text (probably pretty expensive) or introduce an > additional text feature with more bounded vocabulary (i.e. categories). Yes, it is possible to get the category field from AMO. I sent you an email with the full dump

Flags: needinfo?(aplacitelli)

Alessio Placitelli [:Dexter]

Comment 6

•

8 years ago

(In reply to mlopatka from comment #4) > @Dexter is it possible (very difficult?) to get the 'category' field from > the AMO database included in this featurespace? > The TFIDF model suffers when comparing summaries/descriptions of a very > different length. So, I can go the direction of doing more aggressive > preprocessing on the text (probably pretty expensive) or introduce an > additional text feature with more bounded vocabulary (i.e. categories). After discussing this with Martin, he really meant 'permission', not 'category' (which was already provided). The problem with the addon permissions is that they are only available for webextension addons, which makes this hardly useful for recommendations. We synced up over IRC for this :)

mlopatka

Assignee

Comment 7

•

8 years ago

Simulated annealing approach implemented. I'll begin collecting some labels for the parameter weights. https://github.com/mozilla/addon_recommender_driving_analyses/tree/master/legacy_swap

mlopatka

Assignee

Comment 8

•

8 years ago

Simmulated annealing run to get new parameter weights using a *very* limited number of cycles. new featire weights hard-coded on line 27 of addon_recommender_driving_analyses.py In addition to some tweaks with the language processing similarity over free-text features, recommendations seem to have improved a bit. code audit request:Dexter https://github.com/mozilla/addon_recommender_driving_analyses/tree/master/legacy_swap Perhaps (time permitting) it would be worthwhile to get some people to run a few cycles of the simulated annealing and aggregate weight vector data compared to manual recommendation scoring?

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Comment 9

•

8 years ago

We had a conversation with Martin while going through the design of the simulated annealing model: we identified a few areas of improvement (he'll comment/get to that later) and documented some "use cases" for the legacy recommender. Use cases: - a legacy addon was disabled, and the same addon is already available using the webextension technology; - a legacy addon was disabled, the same addon is not available using webextension but a similar addon (by category, tags, description) is available using webextensions (exactly same functionalities); - a legacy addon was disabled, but no match is available; other webextension addons with comparable (but not the same) features are available and can be recommended; - a very rare legacy addon was disabled; there's no comparable, similar or related addon implemented using webextensions. We should keep these use cases in mind when reasoning about the recommender and evaluating it.

Flags: needinfo?(alessio.placitelli)

Mark Reid [:mreid]

Comment 10

•

8 years ago

Martin, is this actively being worked on? Do you think you will work on this in Q4?

Flags: needinfo?(mlopatka)

mlopatka

Assignee

Comment 11

•

8 years ago

Yes, we currently use a curated list provided by the AMO folks, but that is also being used to train a more automated method. Legacy-based recommendations are only going to be useful in the Q4 and drop off into Q1/Q2 2018. So this is still on the Q4 agenda.

Flags: needinfo?(mlopatka)

Stuart Colville [:muffinresearch]

Comment 12

•

8 years ago

As requested here's some thoughts on performance which could be interesting to investigate if we're looking to move this forward to a wider audience in due course. Currently the page (in disco pane) and the recommendations are keyed by the clientId. This means that from a caching perspective we can only cache per-user, for the disco pane if the cache of the page is only fresh for 1 hour, every repeat visit after then generally has a stale cache. Which means the entire chain of API calls is made all the way to TAAR and back. It would be interesting to know across the study if there are clusters of users that get the same list of recommendations. If we found that the clusters were large enough in size we could look at alternative ways to key our requests so that we can cache the content for more users and this may apply to both AMO's requests and the TAAR engine itself. Being able to do more caching could potentially make a big difference to the overall peformance of the service and help with handling the load. Other factors to note: For the disco pane the other variables in the URL are locale (browser UI locale not accept-language), firefox version, OS platform (e.g. Darwin) and compatibility mode (I'm not quite sure what the last one represents it's often set to "normal"). E.g: https://discovery.addons.mozilla.org/%LOCALE%/firefox/discovery/pane/%VERSION%/%OS%/%COMPATIBILITY_MODE% From AMO's perspective we would need to know what size a cluster of recommendations would be taking into account those parameters to be able to improve on the whole page caching we're currently doing. Of course AMO is only one part of this, increased caching at any level will likely improve the overall performance.

Mark Reid [:mreid]

Updated

•

8 years ago

Priority: -- → P2

Firefox Bug Husbandry Bot

Comment 13

•

8 years ago

Changed to new component, per bug 1425844.

Component: General → Add-on Recommender

mlopatka

Assignee

Comment 14

•

7 years ago

no more legacy addons.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: Add-on Recommender → General

Bugzilla

Perform analysis of recommendation strategies for legacy addon replacement recommendation

Categories

(Data Platform and Tools :: General, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: mlopatka, Assigned: mlopatka)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Updated