Linux webrender tsan opt xpcshell frequent retries that end up as exception
Categories
(Core :: Sanitizers, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox107 | --- | wontfix |
firefox108 | --- | wontfix |
firefox109 | --- | fixed |
People
(Reporter: CosminS, Assigned: jstutte)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
There are frequent retries on tsan xpcshell tests like that often end up as an exception.
These Test groups are ran when these occur:
browser/components/tests/unit/xpcshell.ini
browser/extensions/formautofill/test/unit/heuristics/third_party/xpcshell.ini
chrome/test/unit/xpcshell.ini
devtools/client/application/test/xpcshell/xpcshell.ini
devtools/client/shared/remote-debugging/test/xpcshell/xpcshell.ini
devtools/client/webconsole/test/xpcshell/xpcshell.ini
dom/encoding/test/unit/xpcshell.ini
dom/indexedDB/test/unit/xpcshell-child-process.ini
dom/media/webvtt/test/xpcshell/xpcshell.ini
dom/promise/tests/unit/xpcshell.ini
intl/strres/tests/unit/xpcshell.ini
modules/libmar/tests/unit/xpcshell.ini
modules/libpref/test/unit_ipc/xpcshell.ini
netwerk/dns/tests/unit/xpcshell.ini
remote/shared/messagehandler/test/xpcshell/xpcshell.ini
services/sync/tests/unit/xpcshell.ini
storage/test/unit/xpcshell.ini
toolkit/components/autocomplete/tests/unit/xpcshell.ini
toolkit/components/backgroundtasks/tests/xpcshell/xpcshell.ini
toolkit/components/contentprefs/tests/unit_cps2/xpcshell.ini
toolkit/components/contextualidentity/tests/unit/xpcshell.ini
toolkit/components/crashes/tests/xpcshell/xpcshell.ini
toolkit/components/crashmonitor/test/unit/xpcshell.ini
toolkit/components/ctypes/tests/unit/xpcshell.ini
toolkit/components/messaging-system/targeting/test/unit/xpcshell.ini
toolkit/components/places/tests/queries/xpcshell.ini
toolkit/components/places/tests/sync/xpcshell.ini
toolkit/components/thumbnails/test/xpcshell.ini
toolkit/components/url-classifier/tests/unit/xpcshell.ini
toolkit/components/windowcreator/tests/unit/xpcshell.ini
toolkit/content/tests/unit/xpcshell.ini
toolkit/crashreporter/test/unit_ipc/xpcshell-phc.ini
toolkit/mozapps/extensions/test/xpcshell/rs-blocklist/xpcshell.ini
toolkit/mozapps/update/tests/unit_background_update/xpcshell.ini
widget/tests/unit/xpcshell.ini
decoder, is this something you could have a look over? Thank you.
Reporter | ||
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•3 years ago
|
||
When did this start? Since we still don't upload log artifacts for jobs that fail with an exception, this is difficult to debug. The most likely cause is that the machine swaps/runs OOM for some reason and it didn't before. Did we change anything about machine configurations? Are the failing test groups new? What did change around the time when this started?
![]() |
||
Comment 6•3 years ago
|
||
Started with this push on October 14th. The test groups are the same for the X6 task on the previous push for which it succeeded on the first attempt.
Comment 7•3 years ago
|
||
That push has a lot of AWS -> GCP commits in it. Did this job move as well? Did the machine configuration change in any way?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 10•3 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #7)
That push has a lot of AWS -> GCP commits in it. Did this job move as well? Did the machine configuration change in any way?
These jobs were not part of the changes. I will bisect on Try.
![]() |
||
Comment 11•3 years ago
|
||
Backfills point to bug 1774462 as the first push affected with frequent automatic retries of Linux TSan xpcshell failures because the tasks encounter issues. The affected task runs the tests in dom/indexedDB/test/unit/xpcshell-child-process.ini
, among others.
Assignee | ||
Comment 12•3 years ago
•
|
||
Hmm, it is not easy to look at anything here, as all the tasks I looked at did not finish the log parsing and trying to access the log through the task itself gives me a network error like this. In comment 0 I see dom/indexedDB/test/unit/xpcshell-child-process.ini
, too, and that runs dom/indexedDB/test/test_keys.html
, IIUC.
I wonder if we just end up with an OOM here given that test_keys.html
allocates a huge array which bites us for some reason only in this constellation in tsan. We could try to avoid running this test in tsan (and other memory sensitive constellations).
Comment 13•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #12)
I wonder if we just end up with an OOM here given that
test_keys.html
allocates a huge array which bites us for some reason only in this constellation in tsan. We could try to avoid running this test in tsan (and other memory sensitive constellations).
I think this is exactly what happens and I agree, we should not run this test in configurations that require more memory (e.g. sanitizers).
Comment 14•3 years ago
|
||
Set release status flags based on info from the regressing bug 1774462
Assignee | ||
Comment 15•3 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #13)
I think this is exactly what happens and I agree, we should not run this test in configurations that require more memory (e.g. sanitizers).
I assume we do some different OOM handling in those builds? As the test wants to account for a fallible allocation of that array, mostly to exclude 32Bit systems, but that seems not to help here.
Assignee | ||
Comment 16•3 years ago
|
||
Hmm, could that specific test just check for AppConstants.TSAN || AppConstants.ASAN
? We could make us just skip the one key with large allocation, somehow.
Comment 17•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #15)
(In reply to Christian Holler (:decoder) from comment #13)
I think this is exactly what happens and I agree, we should not run this test in configurations that require more memory (e.g. sanitizers).
I assume we do some different OOM handling in those builds? As the test wants to account for a fallible allocation of that array, mostly to exclude 32Bit systems, but that seems not to help here.
Our sanitizers are configured to allow for fallible allocations (for TSan in particular here). But it is possible that this fails in edge cases. We've seen this happen in fuzzing with ASan many times that if we hit the exact right spot to OOM, we might hit an infallible allocation in the ASan internals.
(In reply to Jens Stutte [:jstutte] from comment #16)
Hmm, could that specific test just check for
AppConstants.TSAN || AppConstants.ASAN
? We could make us just skip the one key with large allocation, somehow.
I don't know if this is available inside this this particular test, but if it is, then I'd also prefer that option.
Assignee | ||
Comment 19•3 years ago
|
||
Comment 20•3 years ago
|
||
Comment 21•3 years ago
|
||
Backed out for causing failures at test_keys.html.
Backout link: https://hg.mozilla.org/integration/autoland/rev/42a01ff3077c18e976e57d973e20f91ab412ae3a
Failure log: https://treeherder.mozilla.org/logviewer?job_id=398331914&repo=autoland&lineNumber=2067
Assignee | ||
Comment 22•3 years ago
|
||
Ah, it also runs as mochitest and there we do not have AppConstants ?
Comment 23•3 years ago
|
||
I think the problem is that ChromeUtils
is not available in mochitests.
Maybe you can get AppConstants
using SpecialPowers in that case.
https://searchfox.org/mozilla-central/rev/2fc2ccf960c2f7c419262ac7215715c5235948db/dom/animation/test/document-timeline/test_document-timeline.html#30
Assignee | ||
Comment 24•3 years ago
|
||
OK, so the same test_keys.js is run in three different contextes, two times in the mochitest test_keys.html (one normal, one as worker) and one time standalone as xpcshell test. Unfortunately all three environments need different ways of accessing AppConstants
. But as we have that triple coverage, and after talking with :janv and :asuth, we think we can just disable the entire test for xpcshell with ASAN/TSAN - unless we know, that we should expect problems also from mochitests with similar OOMs. :decoder?
Comment 25•3 years ago
|
||
In general it would be favorable to somewhat separate OOM-like tests from regular tests, but I am not aware of other tests right now causing problems and if the quickest way to move forward is to disable this test, we should just do that :)
Updated•3 years ago
|
Updated•3 years ago
|
Comment 26•3 years ago
|
||
Reporter | ||
Comment 27•3 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 29•3 years ago
|
||
The patch landed in nightly and beta is affected.
:jstutte, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox108
towontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 30•3 years ago
|
||
Do we run TSAN tests frequently on beta/release versions?
Comment 31•3 years ago
|
||
I believe we run them as often as other tests, there is nothing special about sanitizers there afaik.
![]() |
||
Comment 32•3 years ago
|
||
TSan XPCshell doesn't run on beta and release.
Assignee | ||
Updated•3 years ago
|
Description
•