[Ace-users] [ace-bugs] trying to fix an ACE hang
Greg Popovitch
gpy at altair.com
Thu Nov 29 20:35:17 CST 2007
Yes, you are right Doug... here is some history:
Sun May 5 19:14:34 2002 Douglas C. Schmidt
<schmidt at macarena.cs.wustl.edu>
* ace/Message_Queue_T.cpp: Modified all the enqueue*() methods
so that
their calls to notify() occur *outside* of the monitor lock.
This change prevents deadlock from occurring when a reactor's
notification pipe is full. Thanks to Sasha Agranov
<sagranov at COMGATES.co.il> for reporting this.
Thu Aug 15 10:43:51 2002 Steve Huston <shuston at riverace.com>
* ace/Message_Queue_T.cpp (enqueue_tail): Moved notify() call
outside
lock scope, as Sun May 5 19:14:34 2002 Douglas C. Schmidt
suggested.
Sat Mar 22 11:58:12 2003 Douglas C. Schmidt
<schmidt at tango.doc.wustl.edu>
* ace/Message_Queue_T.cpp: Moved the notify() hook calls within
the protection of the guard lock critical section to prevent
race conditions on cleanup. Thanks to Ron Muck
<rlm at sdiusa.com>
for this suggestion.
I am pretty sure there is a deadlock occuring with the current code.
Basically when enqueuing a message for another thread on a message
queue, the other thread is notified through the reactor. What happens in
my case is that the thread processing these messages runs at a lower
priority. So the reactor's notification message queue fills up.
Eventually, it gets full and the thread enqueuing messages hangs on the
notify while holding the lock of my message queue. The low priority
thread is on handle_input(), but can't dequeue a message from its
message queue because it is locked by the sender thread. As a result it
stops processing notifications hence the deadlock.
So my thread 1 locks resource A (my message queue) and then resource B
(while A still locked)
The only way that resource B (reactor message queue) will be unlocked is
when thread 2 processes handle_input(), but for doing that it needs to
get a message from resource A (my message queue), hence the deadlock.
Maybe we could detect that the call "this->notify ();" will block, and
exit the message queue guard only in that case?
Any other suggestion to avoid the deadlock?
Gregory
PS: my low priority thread does I/O for each message. I saw that in
Vista the I/O priority is now controlled by the priority of the thread
doing the I/O, unlike in previous versions. Maybe this is why I have
seen this problem only on Vista as the processing of these messages
would be slower. Still, it is not acceptable that ACE deadlocks in this
normal condition.
-----Original Message-----
From: Douglas C. Schmidt [mailto:schmidt at dre.vanderbilt.edu]
Sent: Thursday, November 29, 2007 4:44 PM
To: Greg Popovitch; ace-bugs at cse.wustl.edu
Subject: Re: [ace-bugs] trying to fix an ACE hang
Hi Greg,
>YES! I think found exactly this issue:
>
>=20
>
>Sun May 5 19:14:34 2002 Douglas C. Schmidt
><schmidt at macarena.cs.wustl.edu>
>
>=20
>
> * ace/Message_Queue_T.cpp: Modified all the enqueue*()
>methods so that
>
> their calls to notify() occur *outside* of the
monitor
>lock.
>
> This change prevents deadlock from occurring when a
>reactor's
>
> notification pipe is full. Thanks to Sasha Agranov
>
> <sagranov at COMGATES.co.il> for reporting this.
>
>=20
>
>Did it creep back in somehow?
I suspect if you look further along in the changelog entries you'll
find that this change was reverted since it broke other things..
Thanks,
Doug
>=20
>
>greg
>
>=20
>
>From: Steve Huston [mailto:shuston at riverace.com]=20
>Sent: Thursday, November 29, 2007 4:15 PM
>To: Greg Popovitch; ace-bugs at cse.wustl.edu
>Subject: RE: [ace-bugs] trying to fix an ACE hang
>
>=20
>
>Hi Greg,
>
>=20
>
>Thanks for the PROBLEM-REPORT-FORM.
>
>=20
>
>Doug poked me to see if I remembered this type of problem. Your
>description looks familiar - I've a feeling we've been down this road
>before, but can't recall the details. Could you please scan through the
>ACE_wrappers/ChangeLogs files for changes in this area to see if we've
>tried this fix before?
>
>=20
>
>Thanks!
>-Steve
>
>--
>Steve Huston, Riverace Corporation
>Want to take ACE training on YOUR schedule?
>See http://www.riverace.com/training.htm
>
> -----Original Message-----
> From: ace-bugs-bounces at cse.wustl.edu
>[mailto:ace-bugs-bounces at cse.wustl.edu] On Behalf Of Greg Popovitch
> Sent: Thursday, November 29, 2007 2:23 PM
> To: ace-bugs at cse.wustl.edu
> Subject: [ace-bugs] trying to fix an ACE hang
>
> Hi,
>
> =20
>
> I have a hang on Windows Vista 64 and also on linux 64 (RHEL 4).
>
>
> =20
>
> One thread is enqueuing a message on an ACE_Message_Queue for
>another thread and hangs on the notification to the WFMO reactor. The
>ACE_WFMO_Reactor_Notify::notify() hangs because its internal message
>queue is full.
>
> =20
>
> The other thread hangs while checking of the message_queue is
>empty because it can't grab its ACE_GUARD. Therefore it can't dequeue
>messages from the WFMO_Reactor.
>
> =20
>
> My question:=20
>
> =20
>
> In file Message_Queue_T.cpp,
>
> =20
>
> Would it be OK to move the "this->notify ()" outside of the
>scope of the ACE_GUARD in:
>
> =20
>
> template <ACE_SYNCH_DECL> int
>
> ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_head
>(ACE_Message_Block *new_item,
>
> ACE_Time_Value
>*timeout)
>
> {
>
> ACE_TRACE ("ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_head");
>
> int queue_count =3D 0;
>
> {
>
> ACE_GUARD_RETURN (ACE_SYNCH_MUTEX_T, ace_mon, this->lock_,
>-1);
>
> =20
>
> if (this->state_ =3D=3D ACE_Message_Queue_Base::DEACTIVATED)
>
> {
>
> errno =3D ESHUTDOWN;
>
> return -1;
>
> }
>
> =20
>
> if (this->wait_not_full_cond (ace_mon, timeout) =3D=3D -1)
>
> return -1;
>
> =20
>
> queue_count =3D this->enqueue_head_i (new_item);
>
> =20
>
> if (queue_count =3D=3D -1)
>
> return -1;
>
> =20
>
> this->notify (); //******** move after closing brace below
>??? ******
>
> }
>
> return queue_count;
>
> }
>
> =20
>
> =20
>
> ACE VERSION: 5.6
>
> =20
>
> HOST MACHINE and OPERATING SYSTEM:
>
> Vindows Vista
>
> =20
>
> TARGET MACHINE and OPERATING SYSTEM, if different from HOST:
>
> =20
>
> COMPILER NAME AND VERSION (AND PATCHLEVEL):
>
> Visual Studio 2005
>
> =20
>
> Thanks,
>
> =20
>
> greg
>
>
>------_=_NextPart_001_01C832CE.C912EB7B
>Content-Type: text/html;
> charset="us-ascii"
>Content-Transfer-Encoding: quoted-printable
>
><html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
>xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
>xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
>xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
>xmlns=3D"http://www.w3.org/TR/REC-html40">
>
><head>
><META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
>charset=3Dus-ascii">
><meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)">
><title>Message</title>
><style>
><!--
> /* Font Definitions */
> @font-face
> {font-family:"Cambria Math";
> panose-1:2 4 5 3 5 4 6 3 2 4;}
>@font-face
> {font-family:Calibri;
> panose-1:2 15 5 2 2 2 4 3 2 4;}
>@font-face
> {font-family:Tahoma;
> panose-1:2 11 6 4 3 5 4 4 2 4;}
> /* Style Definitions */
> p.MsoNormal, li.MsoNormal, div.MsoNormal
> {margin:0in;
> margin-bottom:.0001pt;
> font-size:11.0pt;
> font-family:"Calibri","sans-serif";}
>a:link, span.MsoHyperlink
> {mso-style-priority:99;
> color:blue;
> text-decoration:underline;}
>a:visited, span.MsoHyperlinkFollowed
> {mso-style-priority:99;
> color:purple;
> text-decoration:underline;}
>p
> {mso-style-priority:99;
> mso-margin-top-alt:auto;
> margin-right:0in;
> mso-margin-bottom-alt:auto;
> margin-left:0in;
> font-size:12.0pt;
> font-family:"Times New Roman","serif";}
>p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
> {mso-style-priority:34;
> margin-top:0in;
> margin-right:0in;
> margin-bottom:0in;
> margin-left:.5in;
> margin-bottom:.0001pt;
> font-size:11.0pt;
> font-family:"Calibri","sans-serif";}
>span.EmailStyle18
> {mso-style-type:personal;
> font-family:"Calibri","sans-serif";
> color:windowtext;}
>span.EmailStyle20
> {mso-style-type:personal-reply;
> font-family:"Calibri","sans-serif";
> color:#1F497D;}
>.MsoChpDefault
> {mso-style-type:export-only;
> font-size:10.0pt;}
>@page Section1
> {size:8.5in 11.0in;
> margin:1.0in 1.0in 1.0in 1.0in;}
>div.Section1
> {page:Section1;}
>-->
></style>
><!--[if gte mso 9]><xml>
> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
></xml><![endif]--><!--[if gte mso 9]><xml>
> <o:shapelayout v:ext=3D"edit">
> <o:idmap v:ext=3D"edit" data=3D"1" />
> </o:shapelayout></xml><![endif]-->
></head>
>
><body lang=3DEN-US link=3Dblue vlink=3Dpurple>
>
><div class=3DSection1>
>
><p class=3DMsoNormal><span style=3D'color:#1F497D'>YES! I think found =
>exactly this
>issue:<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'><o:p> </o:p></span></p>
>
><p class=3DMsoNormal><span style=3D'color:#1F497D'>Sun May 5 =
>19:14:34 2002
>Douglas C. Schmidt =
><schmidt at macarena.cs.wustl.edu><o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'><o:p> </o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'>  
;&=
>nbsp; *
>ace/Message_Queue_T.cpp: Modified all the enqueue*() methods so =
>that<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'>  
;&=
>nbsp; their calls to
>notify() occur *outside* of the monitor lock.<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span style=3D'color:#1F497D'> =
>  
;&=
>nbsp; This change
>prevents deadlock from occurring when a reactor's<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'>  
;&=
>nbsp; notification
>pipe is full. Thanks to Sasha Agranov<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'>  
;&=
>nbsp;
><sagranov at COMGATES.co.il> for reporting =
>this.<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'><o:p> </o:p></span></p>
>
><p class=3DMsoNormal><span style=3D'color:#1F497D'>Did it creep back in
=
>somehow?<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'><o:p> </o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'>greg<o:p></o:p></span></p>
>
><p class=3DMsoNormal><span =
>style=3D'color:#1F497D'><o:p> </o:p></span></p>
>
><div>
>
><div style=3D'border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt
=
>0in 0in 0in'>
>
><p class=3DMsoNormal><b><span =
>style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</spa
n>=
></b><span
>style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Steve =
>Huston
>[mailto:shuston at riverace.com] <br>
><b>Sent:</b> Thursday, November 29, 2007 4:15 PM<br>
><b>To:</b> Greg Popovitch; ace-bugs at cse.wustl.edu<br>
><b>Subject:</b> RE: [ace-bugs] trying to fix an ACE =
>hang<o:p></o:p></span></p>
>
></div>
>
></div>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><div>
>
><p class=3DMsoNormal><span =
>style=3D'font-size:10.0pt;font-family:"Arial","sans-serif";
>color:blue'>Hi Greg,</span><span =
>style=3D'font-size:12.0pt;font-family:"Times New =
>Roman","serif"'><o:p></o:p></span></p>
>
></div>
>
><div>
>
><p class=3DMsoNormal><span style=3D'font-size:12.0pt;font-family:"Times
=
>New Roman","serif"'> <o:p></o:p></span></p>
>
></div>
>
><div>
>
><p class=3DMsoNormal><span =
>style=3D'font-size:10.0pt;font-family:"Arial","sans-serif";
>color:blue'>Thanks for the PROBLEM-REPORT-FORM.</span><span =
>style=3D'font-size:
>12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p>
>
></div>
>
><div>
>
><p class=3DMsoNormal><span style=3D'font-size:12.0pt;font-family:"Times
=
>New Roman","serif"'> <o:p></o:p></span></p>
>
></div>
>
><div>
>
><p class=3DMsoNormal><span =
>style=3D'font-size:10.0pt;font-family:"Arial","sans-serif";
>color:blue'>Doug poked me to see if I remembered this type of problem.
=
>Your
>description looks familiar - I've a feeling we've been down this road =
>before,
>but can't recall the details. Could you please scan through the
>ACE_wrappers/ChangeLogs files for changes in this area to see if we've
=
>tried
>this fix before?</span><span =
>style=3D'font-size:12.0pt;font-family:"Times New =
>Roman","serif"'><o:p></o:p></span></p>
>
></div>
>
><div>
>
><p class=3DMsoNormal><span style=3D'font-size:12.0pt;font-family:"Times
=
>New Roman","serif"'> <o:p></o:p></span></p>
>
></div>
>
><div>
>
><p class=3DMsoNormal><span =
>style=3D'font-size:10.0pt;font-family:"Arial","sans-serif";
>color:blue'>Thanks!<br>
>-Steve</span><span style=3D'font-size:12.0pt;font-family:"Times New =
>Roman","serif"'><o:p></o:p></span></p>
>
></div>
>
><p><span style=3D'font-size:10.0pt'>--<br>
>Steve Huston, Riverace Corporation<br>
>Want to take ACE training on YOUR schedule?<br>
>See <a =
>href=3D"http://www.riverace.com/training.htm">http://www.riverace.com/t
ra=
>ining.htm</a></span><o:p></o:p></p>
>
><blockquote style=3D'border:none;border-left:solid blue =
>1.5pt;padding:0in 0in 0in 4.0pt;
>margin-left:3.75pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0p
t'=
>>
>
><p class=3DMsoNormal style=3D'margin-bottom:12.0pt'><span =
>style=3D'font-size:10.0pt;
>font-family:"Tahoma","sans-serif"'>-----Original Message-----<br>
><b>From:</b> ace-bugs-bounces at cse.wustl.edu
>[mailto:ace-bugs-bounces at cse.wustl.edu] <b>On Behalf Of </b>Greg =
>Popovitch<br>
><b>Sent:</b> Thursday, November 29, 2007 2:23 PM<br>
><b>To:</b> ace-bugs at cse.wustl.edu<br>
><b>Subject:</b> [ace-bugs] trying to fix an ACE hang</span><span
>style=3D'font-size:12.0pt;font-family:"Times New =
>Roman","serif"'><o:p></o:p></span></p>
>
><p class=3DMsoNormal>Hi,<o:p></o:p></p>
>
><p class=3DMsoNormal> <o:p></o:p></p>
>
><p class=3DMsoNormal>I have a hang on Windows Vista 64 and also on
linux =
>64 (RHEL
>4). <o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>One thread is enqueuing a message on an =
>ACE_Message_Queue
>for another thread and hangs on the notification to the WFMO reactor. =
>The
> ACE_WFMO_Reactor_Notify::notify() hangs because its internal =
>message
>queue is full.<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>The other thread hangs while checking of the =
>message_queue
>is empty because it can’t grab its ACE_GUARD. Therefore it =
>can’t dequeue
>messages from the WFMO_Reactor.<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>My question: <o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>In file Message_Queue_T.cpp,<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>Would it be OK to move the “this->notify
=
>()” outside of
>the scope of the ACE_GUARD in:<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>template <ACE_SYNCH_DECL> int<o:p></o:p></p>
>
><p =
>class=3DMsoNormal>ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_head
=
>(ACE_Message_Block
>*new_item,<o:p></o:p></p>
>
><p =
>class=3DMsoNormal>  
;&=
>nbsp;
&n=
>bsp; &
nb=
>sp; &n
bs=
>p;
>ACE_Time_Value *timeout)<o:p></o:p></p>
>
><p class=3DMsoNormal>{<o:p></o:p></p>
>
><p class=3DMsoNormal> ACE_TRACE
>("ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_head");<o:p
><=
>/o:p></p>
>
><p class=3DMsoNormal> int queue_count =3D 0;<o:p></o:p></p>
>
><p class=3DMsoNormal> {<o:p></o:p></p>
>
><p class=3DMsoNormal> ACE_GUARD_RETURN =
>(ACE_SYNCH_MUTEX_T,
>ace_mon, this->lock_, -1);<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> if (this->state_ =3D=3D
>ACE_Message_Queue_Base::DEACTIVATED)<o:p></o:p></p>
>
><p class=3DMsoNormal> {<o:p></o:p></p>
>
><p class=3DMsoNormal> errno =
>=3D
>ESHUTDOWN;<o:p></o:p></p>
>
><p class=3DMsoNormal> return
=
>-1;<o:p></o:p></p>
>
><p class=3DMsoNormal> }<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> if (this->wait_not_full_cond
=
>(ace_mon,
>timeout) =3D=3D -1)<o:p></o:p></p>
>
><p class=3DMsoNormal> return =
>-1;<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> queue_count =3D =
>this->enqueue_head_i
>(new_item);<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> if (queue_count =3D=3D =
>-1)<o:p></o:p></p>
>
><p class=3DMsoNormal> return =
>-1;<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> this->notify ();
>//******** move after closing brace below ??? ******<o:p></o:p></p>
>
><p class=3DMsoNormal> }<o:p></o:p></p>
>
><p class=3DMsoNormal> return queue_count;<o:p></o:p></p>
>
><p class=3DMsoNormal>}<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> ACE VERSION: 5.6<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> HOST MACHINE and OPERATING =
>SYSTEM:<o:p></o:p></p>
>
><p class=3DMsoNormal> Vindows
=
>Vista<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> TARGET MACHINE and OPERATING =
>SYSTEM, if
>different from HOST:<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal> COMPILER NAME AND VERSION (AND
>PATCHLEVEL):<o:p></o:p></p>
>
><p class=3DMsoNormal> Visual
=
>Studio
>2005<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>Thanks,<o:p></o:p></p>
>
><p class=3DMsoNormal><o:p> </o:p></p>
>
><p class=3DMsoNormal>greg<o:p></o:p></p>
>
></blockquote>
>
></div>
>
></body>
>
></html>
>
>------_=_NextPart_001_01C832CE.C912EB7B--
>
--
Dr. Douglas C. Schmidt Professor and Associate
Chair
Electrical Engineering and Computer Science TEL: (615) 343-8197
Vanderbilt University WEB:
www.dre.vanderbilt.edu/~schmidt
Nashville, TN 37203 NET:
d.schmidt at vanderbilt.edu
More information about the Ace-users
mailing list