Closed Bug 1198191 Opened 10 years ago Closed 10 years ago

[e10s] Confirm existing tab-switch telemetry measurement are meaningful

Categories

(Core :: General, defect, P1)

42 Branch
defect

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---

People

(Reporter: jimm, Assigned: gkrizsanits)

References

(Blocks 1 open bug)

Details

+++ This bug was initially created as a clone of Bug #1198188 +++ Part of the e10s release criteria work.
tracking-e10s: --- → +
Assignee: nobody → gwright
Blocks: e10s-perf
Priority: -- → P1
Vladan, you had a line item here about this - https://docs.google.com/document/d/1TyE0BehzYhii3qfmcrfjXlRJL64CcJk0B4Voup4Q0Pg/edit# Do you have any more specifics? We have a few tab switch performance probes and the talos tps test. We're currently blocking on regressions in the talos test. From t.m.o - FX_TAB_CLICK_MS FX_TAB_SWITCH_SPINNER_VISIBLE_MS FX_TAB_SWITCH_TOTAL_E10S_MS FX_TAB_SWITCH_TOTAL_MS FX_TAB_SWITCH_UPDATE_MS The two we might be interested in are FX_TAB_SWITCH_TOTAL_*. I did some analysis of these back in June, from what I remember FX_TAB_SWITCH_TOTAL_MS might not be a reliable probe anymore. Maybe we should look into that at least.
Flags: needinfo?(vladan.bugzilla)
> Do you have any more specifics? It's basically what you described, I'd like to know which of the measurements you listed are meaningful & which can be used for an e10s vs non-e10s comparison
Flags: needinfo?(vladan.bugzilla)
And by meaningful, I mean "corresponds to the tab switch time seen by a user"
Assignee: gwright → gkrizsanits
(In reply to Jim Mathies [:jimm] from comment #1) > The two we might be interested in are FX_TAB_SWITCH_TOTAL_*. I did some > analysis of these back in June, from what I remember FX_TAB_SWITCH_TOTAL_MS > might not be a reliable probe anymore. Maybe we should look into that at > least. Why is it not a reliable probe exactly? Anyway, what worries me a lot here that in non-e10s when content hangs for whatever reason, tab switching is just not possible, while with e10s mode it is possible but it will show the spinner, so we report it as a long tab switch. In general it's not trivial to justify if the spinner is on that it's a tab switching issue or not. If I get the code right we also count in the page loading time, do we do that for non-e10s too? If so maybe we should look at these numbers separately. Finally in e10s it's not yet clear to me what do we measure if we use the tabswitcher to load multiple tabs simultaneously... I think we should start with the simplest case and measure only the simplest scenario, and filter out every other cases.
One thing that is weird that FX_TAB_SWITCH_TOTAL_MS does not seem to work at all any longer. On telemetry from aurora 46 and above we don't collect data from this. We call window.QueryInterface(Ci.nsIInterfaceRequestor).getInterface(Ci.nsIDOMWindowUtils) .beginTabSwitch(); in tabbrowser.xml but PostPresent on that LayerManager is not called after. So we never report anything in none10s mode.
Depends on: 1244684
(In reply to Vladan Djeric (:vladan) -- last day at Moz Friday Jan 29 from comment #3) > And by meaningful, I mean "corresponds to the tab switch time seen by a user" One thing that is not clear to me if the browser hangs in non-e10s or when the content-process hangs in e10s there will be a fundamental difference between the 2. In non-e10s mode you cannot start a tab switch in this state, but in e10s mode you can, and it will be a long tab switch. How should we handle this difference? This might seem like a longer average tab switch time in e10s mode, yet I would argue that it's still a better user experience. What do you think? How can we do a fair comparison here?
Flags: needinfo?(vladan.bugzilla)
Jeff, I think it's not fair to compare the tab switching time of the e10s and the non-e10s version because the issue I've described in Comment 6. Especially since we don't even have the data for the e10s version right now (see Comment 5). I would focus on getting a reliable measurement up and running for the e10s case (or make sure that the current one really measures what it should) so we can make sure we don't regress from here. How does that sound to you?
Flags: needinfo?(vladan.bugzilla) → needinfo?(jgriffiths)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #7) > Jeff, I think it's not fair to compare the tab switching time of the e10s > and the non-e10s version because the issue I've described in Comment 6. > Especially since we don't even have the data for the e10s version right now > (see Comment 5). I would focus on getting a reliable measurement up and > running for the e10s case (or make sure that the current one really measures > what it should) so we can make sure we don't regress from here. How does > that sound to you? Focusing on something more reliable seems reasonable to me. I'm assuming the following: 1. this new measurement allows us to compare e10s and non-e10s right now 2. this new measurement will allow us to detect regressions going forward for e10s, and ideally for non-e10s for as long as we'd care to continue supporting that
Flags: needinfo?(jgriffiths)
(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #8) > 1. this new measurement allows us to compare e10s and non-e10s right now Again, that will never be a fair comparison. If JS is busy or for some other reasons the process hangs, in non-e10s mode you cannot start a tab switch, in e10s mode you can, and typically you will, and we will report longer tab switching time because of that since we have to wait for the content process to send us the layers, probably show the spinner too. The e10s UX will be better in this case since at least the browser does not hangs/janks yet the tab switch numbers will show worse e10s numbers. I would not rely on a comparison like that. If there is a noticeable issue with the e10s tab switch we have to work on that but comparing times against non-e10s is not the way to go imo.
Flags: needinfo?(jgriffiths)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #9) > (In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #8) > > 1. this new measurement allows us to compare e10s and non-e10s right now > > Again, that will never be a fair comparison. If JS is busy or for some other > reasons the process hangs, in non-e10s mode you cannot start a tab switch, > in e10s mode you can, and typically you will, and we will report longer tab > switching time because of that since we have to wait for the content process > to send us the layers, probably show the spinner too. The e10s UX will be > better in this case since at least the browser does not hangs/janks yet the > tab switch numbers will show worse e10s numbers. I would not rely on a > comparison like that. If there is a noticeable issue with the e10s tab > switch we have to work on that but comparing times against non-e10s is not > the way to go imo. Ok, thanks for the re-iteration - I defer to your judgement.
Flags: needinfo?(jgriffiths)
I think the Win32 API can tell us when an event was actually dispatched to us. The MSG structure [1] has a time field, which is millisecond resolution. It might be a bit tricky to actually use this, but I suspect we could make it work. I think it's definitely worth it if it allows us to measure tab switch time meaningfully. [1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms644958%28v=vs.85%29.aspx
Jim pointed out that we have _some_ data from the non-e10s builds. Probably not a sample we want to rely on too much because as Mike figured it out these samples must come from users with off main thread composition turned off (so it's not a representative sample). But here are the data we have right now: Beta 45: ======== - non-e10s sample: 130.64k median: 82.5 95th Percentile: 297.03 - e10s sample: 140.36M median: 51.85 95th Percentile: 455.65 Aurora 46: ========== - non-e10s sample: 55.43k median: 49.25 95th Percentile: 173.69 -e10s sample: 112M median: 54.92 95th Percentile: 383.91 Nightly 47: =========== - non-e10s sample: 86.69k median: 124.17 95th Percentile: 334.43 - e10s sample: 50.81M median: 53.08 95th Percentile: 377.71 In general median is lower with e10s but looking at the e10s graph there are more significantly longer switch times. Some of these longer times must come from hangs I talked about earlier but it would be worth compare these e10s samples against the jank samples to see if my theory checks out or if there is some other factor too. Anyway, once bug 1244684 lands we will have more reliable samples for the non-e10s case.
Thanks for the update. I don't like it's useful at this stage to bother considering this data given the obvious bias and your opinion that once bug 1244684 lands we'll have better data. As an aside: do we know if there is any reason for the seeming slide in performance for non-e10s in Nightly 47? The median is over 2x slower than Aurora 46. Can we even meaningfully compare those two numbers?
(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #13) > Thanks for the update. I don't like it's useful at this stage to bother > considering this data given the obvious bias and your opinion that once bug > 1244684 lands we'll have better data. Yeah I agree, I've checked the new data and it looks pretty good: non-e10s ======== Sample Count 1.17M Selected Dates 2016/02/24 to 2016/02/27 5th Percentile 27.57 25th Percentile 51.1 Median 78.69 75th Percentile 128.46 95th Percentile 309.82 e10s ===== Sample Count 62.51M Selected Dates 2016/01/25 to 2016/02/27 5th Percentile 14.44 25th Percentile 30.98 Median 53.11 75th Percentile 98 95th Percentile 378.15 I'd say we can close this bug. And maybe we could open a followup bug to investigate the long tail in the e10s case, if it's all related to content process hangs or if there is something else too. But that should have a lot lower prio, since even with the tail e10s seem to improve tab switch times. An interesting fact is that we're investigating a quite heavy tps regression (Bug 1186585) so these result are better than what I hoped for. The tps test measures a very specific scenario though, so that's not necessarily a contradiction. > > As an aside: do we know if there is any reason for the seeming slide in > performance for non-e10s in Nightly 47? The median is over 2x slower than > Aurora 46. Can we even meaningfully compare those two numbers? That was quite surprising for me too. I'm kind of leaning toward ignoring it though, since it can be easily explained by the change in the population that gets off main thread composition. I think we should just uplift bug 1244684 instead and examine those numbers. If we can see the same shift, then we have a problem. The thing is, because this probe was broken, and we didn't get enough data from it, it was filtered out entirely, so I would assume we took any patch that caused regression for quite a long time. Anyway I vouch for the uplift in the other bug and we'll see.
Filed bug 1252030 and bug 1252031 as followups.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.