Closed Bug 1141568 Opened 10 years ago Closed 5 years ago

TSan: data race js/src/jsgc.cpp:1794 prepareForIncrementalGC

Categories

(Core :: JavaScript: GC, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: froydnj, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [tsan])

Attachments

(2 files, 1 obsolete file)

Attached file prepare-for-incremental-gc-race.txt (obsolete) —
The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). * Specific information about this bug I see this race a fair amount (handful of times per small mochitest run), but I've never been able to get satisfactory stacks for the reading side of the race. (I assume the read comes from JIT code?) Posting this here in case somebody who's more acquainted with the GC understands what's going on...Terrence, any ideas? * General information about TSan, data races, etc. Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(terrence)
Sorry it's taken me so long to look at this new round of races! Since we only have one stack here, it's really hard to tell what's going on. Nathan, do you have any other reports that end in prepareForIncrementalGC that have the other stack present? What can we infer from the single stack? The fact that the updated field has size 2 means this is probably not a pointer, so it is probably the bit-field write of allocatedDuringIncremental. This bit field includes allocKind and a ton of other useful things that we read all over the place. The background sweep thread is indeed running here; however, the main thread is only looking at arenas from the emptyLists, which the background sweep thread should not be looking at -- it has its own list. I guess we don't halt parse threads for GC, so maybe some path there can call thing->allocKind()? I think that's about as much as we can tell without the second stack.
Flags: needinfo?(terrence) → needinfo?(nfroyd)
(In reply to Terrence Cole [:terrence] from comment #1) > Sorry it's taken me so long to look at this new round of races! Since we > only have one stack here, it's really hard to tell what's going on. Nathan, > do you have any other reports that end in prepareForIncrementalGC that have > the other stack present? No problem, thanks for looking at these! I haven't been able to get a two-sided stack report for this race yet. I will be looking for one with great eagerness. :) The report says the read is coming from the "Analysis Helper" thread; I assume that's the "parse thread" you're talking about...or is that some JIT thread running?
Flags: needinfo?(nfroyd) → needinfo?(terrence)
"Analysis Helper" is one of the JS thread pool threads. We run all of our background work via this threadpool, so it could be absolutely anything.
Flags: needinfo?(terrence)
Finally got two-sided stacks for this. Terrence, any insight based on this?
Attachment #8575321 - Attachment is obsolete: true
Flags: needinfo?(terrence)
Uhg. This one is going to be hard to fix, but at least I don't think there is going to be any observable side-effects from any arbitrary ordering. So what's going on here: On the background thread, the compiler backend is looking at its template objects in order to generate code to handle objects like those kinds of objects. Some of that information (e.g. the allocKind) is stored in arena headers for sharing. Simultaneously, we are doing a GC that is traversing the arena headers. They are reading and writing on different fields; however, these fields are stored in a bit field. The compiler thread is doing reads of allocKind and the gc threads is doing reads and writes of allocatedDuringIncremental. So, allocKind is fixed for the duration of the GC and the compilation, because we cannot sweep the arena, because the objects are held by the compiler. Therefore, *unless* the gc thread temporarily writes invalid allocKinds into that word, I think this is "safe". This code is crazily inlined, so I wasn't able to trivially pinpoint the relevant writes -- I'll do a build with MOZ_NEVER_INLINE later and see if I can at least get some confidence that I'm not lying at least for clang and gcc.
Flags: needinfo?(terrence)
I learned today that I wasn't using llvm-symbolizer to print stacks, which does a much better job of dealing with inlined code. I can see if I can reproduce the race with llvm-symbolizer active. There's no such thing as a "safe" race. :p
And thank you for the quick response!
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #6) > I learned today that I wasn't using llvm-symbolizer to print stacks, which > does a much better job of dealing with inlined code. I can see if I can > reproduce the race with llvm-symbolizer active. > > There's no such thing as a "safe" race. :p Hey, I used appropriate scare quotes around "safe". ;-) Specifically, by "safe", I mean that I do not see any way that a RAR, RAW, WAR, or WAW reordering could result in dynamically unhandled behavior, and spent a good 15 minutes thinking about the possibilities. Of course, the caveat here is that this is only an analysis at the C++ level -- whether that holds for any particular combination of CPU, compiler, and optimization level is anyone's guess. I, for one, would guess: not safe.
(In reply to Terrence Cole [:terrence] from comment #8) > Specifically, by "safe", I mean that I do not see any way that a RAR, RAW, > WAR, or WAW reordering could result in dynamically unhandled behavior, and > spent a good 15 minutes thinking about the possibilities. Of course, the > caveat here is that this is only an analysis at the C++ level -- whether > that holds for any particular combination of CPU, compiler, and optimization > level is anyone's guess. I, for one, would guess: not safe. Yes. Proving that it's harmless at the source level is not enough because the standard (C++11) specifically allows implementations to reorder delivery of stores between cores. That mean both compilers and hardware are free to do such reordering. Also, since C++11 declares racey code to constitute undefined behaviour, if a compiler can ever prove the code racey, it can transform it however it likes.
Attached file SM(tsan) race
I'm skimming through the current races reported while running SM(tsan), and I think the attached report matches -- or at least, is very similar to -- the one discussed here. Jan, I don't know how space-sensitive MIR nodes are. Would it be possible to store a template object's allocKind alongside the object pointer? Or at least, I'm assuming we could record that information on the main thread when creating the node; perhaps that is incorrect. Alternatively, we could bloat up Arenas by splitting out the allocKind field into its own word. It'd be a shame to waste the space, although on 64 bits I'm not sure it would change anything. (We have 4056 bytes of data in a 4096 byte Arena, which is enough to fit 253 16-byte Cells with 8 bytes of slop. And a js::ShapedObject is 16 bytes. Though even a NativeObject is 32 bytes.)
Flags: needinfo?(jdemooij)
(In reply to Steve Fink [:sfink] [:s:] from comment #10) > Jan, I don't know how space-sensitive MIR nodes are. Would it be possible to > store a template object's allocKind alongside the object pointer? Or at > least, I'm assuming we could record that information on the main thread when > creating the node; perhaps that is incorrect. That should be fine. (We keep running into these issues with template objects where the background thread reads some immutable field that happens to share a word with something else :/ Eventually I'd like to encapsulate template objects better, or store all info we need in a separate data structure, to avoid this.)
Flags: needinfo?(jdemooij)
Blocks: 1367103

(In reply to Terrence Cole [:terrence] from comment #5)

The gc threads is doing reads and writes of allocatedDuringIncremental

This field has been removed, so closing this bug.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: