Closed
Bug 848949
Opened 12 years ago
Closed 12 years ago
ThreadLink::SendClose isn't sync
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
mozilla22
People
(Reporter: BenWa, Assigned: BenWa)
References
Details
Attachments
(2 files, 3 obsolete files)
No description provided.
Attachment #722454 -
Flags: review?(nmalkin)
Assignee | ||
Updated•12 years ago
|
Attachment #722454 -
Attachment is patch: true
![]() |
||
Comment 1•12 years ago
|
||
Comment on attachment 722454 [details] [diff] [review]
patch
Hi there,
I think I was asked to review this patch by accident.
Cheers,
-nmalkin
Comment 2•12 years ago
|
||
Why is this a good thing? With a process link, SendClose isn't a synchronous call. This doesn't feel right.
Assignee | ||
Updated•12 years ago
|
Attachment #722454 -
Flags: review?(nmalkin) → review?(nmatsakis)
Assignee | ||
Comment 3•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2)
> Why is this a good thing? With a process link, SendClose isn't a synchronous
> call. This doesn't feel right.
This is to solve a general problem with having IPDL actors over a ThreadLink. This is based on the assumption that both actors have to be allocated and released on the same thread which is what CompositorParent/Child does. If that assumption doesn't hold then we need another fix.
Here's the sequence for PCompositor with patch from bug 845982 applied:
First we call 'PCompositorChild::Close()', which calls 'AsyncChannel::SynchronouslyClose()' and that calls 'AsyncChannel::ThreadLink::SendClose()'. This function marks the channel as closed and then when returning to 'AsyncChannel::SynchronouslyClose()' we wait on 'ChannelClosed == mChannelState' which we already set in SendClose().
'AsyncChannel::SynchronouslyClose()' doesn't really implement a sync close. But rather the close message is enqueued by SendClose via 'mTargetChan->OnChannelErrorFromLink();' and is still pending.
The main thread will then continue on to delete PCompositorParent which may still have 'AsyncChannel::OnNotifyMaybeChannelError' pending. Under DEBUG we crash because PCompositorParent has been scribbled by jemalloc.
![]() |
||
Comment 4•12 years ago
|
||
I am also not sure if this is the right approach or not. I am somewhat worried about deadlocks, where both ends try to close and wind up waiting for each other. I have to spend a bit of time with the code in gdb to better understand precisely what is going on here, but what I think should be happening is that each side closes independently (as now) and releases hold on some common shared resource such that it is ultimately freed by which runs second. But as I said I need to spend more time reading into the code to refresh my memory of how this was intended to work.
Comment 5•12 years ago
|
||
Some details that I learned on IRC:
* "parent" thread is the compositor thread in this scheme
* "child" thread is the main thread
> Here's the sequence for PCompositor with patch from bug 845982 applied:
> First we call 'PCompositorChild::Close()', which calls
> 'AsyncChannel::SynchronouslyClose()' and that calls
> 'AsyncChannel::ThreadLink::SendClose()'.
On the child AsyncChannel.
> This function marks the channel as
> closed and then when returning to 'AsyncChannel::SynchronouslyClose()' we
> wait on 'ChannelClosed == mChannelState' which we already set in SendClose().
>
> 'AsyncChannel::SynchronouslyClose()' doesn't really implement a sync close.
As documented, it waits for "the I/O thread" to acknowledge the close (in order to synchronize state and make sure that pending messages are discarded). The guarantee it's trying to provide is that no further messages will be delivered to the channel on this side of the conversation. It is not making any guarantees about messages delivered to the *other* side of the conversation.
> The main thread will then continue on to delete PCompositorParent which may
> still have 'AsyncChannel::OnNotifyMaybeChannelError' pending. Under DEBUG we
> crash because PCompositorParent has been scribbled by jemalloc.
Yeah, so you were right on IRC. The actors (CompositorParent objects) should really only be allocated and deleted by the thread that owns them (the compositor thread in this case). For plugins, the channel which receives the close is the child, and it ends up calling PPluginModuleChild::OnChannelClose which does:
DestroySubtree(NormalShutdown);
DeallocSubtree();
DeallocShmems();
But the actual deletion of PluginModuleChild happens externally (in XRE_InitChildProcess).
I think what you want to do here is to add a CompositorParent::ActorDestroy method and post a deletion task, similar to how CrossProcessCompositorParent::ActorDestroy works currently.
Assignee | ||
Comment 6•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #5)
> Yeah, so you were right on IRC. The actors (CompositorParent objects) should
> really only be allocated and deleted by the thread that owns them (the
> compositor thread in this case).
Alright in that case I'm not sure what the proper way is to create the CompositorParent in the compositor thread. Currently we create both actors and call Open():
http://mxr.mozilla.org/mozilla-central/source/widget/xpwidgets/nsBaseWidget.cpp#860
Do I get the compositor thread's MessageLoop, post a task to create the CompositorParent and then call the open. Meanwhile I'd have to block the main thread on waiting for this to happen presumably because we need the channel to be open to continue. Is there a better way to bootstrap this?
Assignee | ||
Updated•12 years ago
|
Attachment #722454 -
Attachment is obsolete: true
Attachment #722454 -
Flags: review?(nmatsakis)
Comment 7•12 years ago
|
||
Hrm, I guess the way we doing the creation is ok... we're basically creating them and then transferring "ownership" to the target thread.
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7)
> Hrm, I guess the way we doing the creation is ok... we're basically creating
> them and then transferring "ownership" to the target thread.
I tried that the IPDL code gen asserts that the actor is allocated/destroyed from the same thread.
Assignee | ||
Comment 9•12 years ago
|
||
Here's the trace for mismatched CxxStackFrame ctor/dtors. Trying to PostTask instead now.
Assignee | ||
Comment 10•12 years ago
|
||
Assignee | ||
Comment 11•12 years ago
|
||
Alright with this patch I get a racy crash. Here's what I think is happening:
In 'AsyncChannel::Close()':
SendSpecialMessage(new GoodbyeMessage()); sends an async message to the other side to close the channel. When the remote side processes the message it will release the other AsyncChannel which will scribble it's memory. The main thread continues on to 'SynchronouslyClose();' which tries to call 'mTargetChan->OnChannelErrorFromLink();' but mTargetChan may have gotten released as a result of GoodbyeMessage.
Assignee | ||
Comment 12•12 years ago
|
||
I think the ownership of ThreadLink.mTargetChan is to blame here. ThreadLink does not own the other channel which can be released at any point.
![]() |
||
Comment 13•12 years ago
|
||
I just had a brief discussion with Chris Jones on this issue. Certainly the goal should be to emulate the process-level semantics as closely as possible. In the process case, when one side closes, it closes its file descriptor, which shows up later as an empty read or error on the other side, thus triggering the other side to shut down. This was the original intention of the threaded code but clearly the logic isn't quite right.
The ownership of the async channels is a bit of a trick case. You can't just use reference counting since both sides reference one another, so there's a clear cycle. It seems like when the first side closes it needs to reach over to the other and NULL out its target channel pointer. This means of course that we have to watch out for the mTargetChan pointer being NULL, which would be the equivalent of a "half-closed pipe" in the process case.
Does this sound reasonable?
Comment 14•12 years ago
|
||
Yes. I suspect that means we'll need synchronization on the ThreadLink? Or is there a way for the ThreadLink to reuse the monitor of its channel?
Assignee | ||
Comment 15•12 years ago
|
||
Here's a quick summary of the classes:
* Both sides have an AsynChannel
* Both sides have an ThreadLink which know about BOTH AsyncChannel (aChan/aTargetChan)
* Both sides SHARE a single REFCOUNTED mMonitor which we assert is held when sending messages but not when destroying.
The problem right now is that AsyncChannel gets released when the top level actor is destroyed. After which the opposite ThreadLink is holding a reference to released memory as its aTargetChan. The monitor is NOT held when this happens.
Here's how I think we need to fix this:
* Destroying the channel needs to hold the monitor to prevent the state of the channel changing unexpectedly from the point of view of the opposite ThreadLink.
* The ThreadLink needs to hold a reference to the channel to prevent its pointer from ever pointing to release memory. This does create a cycle but this cycle is broken when the actors are destroyed during a normal sequence. CTOR counts can be used to prevent this leak.
If we agree I'll write up this patch. I'd REALLY like to have this solve when OMTC refactor merges on central.
![]() |
||
Comment 16•12 years ago
|
||
That sounds about right to me.
Assignee | ||
Comment 17•12 years ago
|
||
The refcounting solution doesn't work because CompositorParent is refcounted and likely many IPDL subclasses also are. Refcounting + multiple inference is not something I want to explore. Jeff also convinced me that reference counting was a crutch when you don't have clear ownership.
I modified the code to use the monitor to null out mTargetChan when it's released so we no longer have a pointer to release memory. This is stritly better then what we do now which is use a freed pointer.
Assignee: nobody → bgirard
Attachment #722883 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #725130 -
Flags: review?(nmatsakis)
![]() |
||
Comment 18•12 years ago
|
||
Comment on attachment 725130 [details] [diff] [review]
Use monitor to remove dead pointer
This looks about right to me. Can you add a comment into the destructor explaining what's going on? Something like:
// Bug 725130: We need to prevent the other side
// from sending us any more messages. The setup
// here is as shown:
//
// (Us) (Them)
// AsyncChannel AsyncChannel
// | ^ \ / ^ |
// | | X | |
// v | / \ | v
// ThreadLink ThreadLink
//
// We want to null out the diagonal link from their ThreadLink
// to our AsyncChannel. Note that we must hold the monitor so
// that we do this atomically with respect to them trying to send
// us a message.
I had to draw the ASCII art to make sure I actually understood what was going on here. =)
Attachment #725130 -
Flags: review?(nmatsakis) → review+
![]() |
||
Comment 19•12 years ago
|
||
Of course, this is bug 848949, not 725130...
Assignee | ||
Comment 20•12 years ago
|
||
Assignee | ||
Comment 21•12 years ago
|
||
Push pending. Try is closed.
Attachment #725130 -
Attachment is obsolete: true
Assignee | ||
Comment 22•12 years ago
|
||
Comment 23•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
You need to log in
before you can comment on or make changes to this bug.
Description
•