[Ace-users] [ace-bugs] [ACE_Message_Queue] notify PIPE block causes Select_Reactor deadlock.
Johnny Willemsen
jwillemsen at remedy.nl
Wed Feb 27 06:08:07 CST 2008
Hi,
Thanks for using the PRF form. I found the change below in svn
Johnny
Sat Mar 22 11:58:12 2003 Douglas C. Schmidt <schmidt at tango.doc.wustl.edu>
* ace/Message_Queue_T.cpp: Moved the notify() hook calls within
the protection of the guard lock critical section to prevent
race conditions on cleanup. Thanks to Ron Muck <rlm at sdiusa.com>
for this suggestion.
From: ace-bugs-bounces at cse.wustl.edu [mailto:ace-bugs-bounces at cse.wustl.edu]
On Behalf Of Rudy Pot
Sent: Wednesday, February 27, 2008 11:29 AM
To: ace-bugs at cs.wustl.edu
Subject: [ace-bugs] [ACE_Message_Queue] notify PIPE block causes
Select_Reactor deadlock.
ACE VERSION: 5.5.6
HOST MACHINE and OPERATING SYSTEM: i386, Linux 2.6.20-1.2933.fc6
TARGET MACHINE and OPERATING SYSTEM, if different from HOST: same.
COMPILER NAME AND VERSION (AND PATCHLEVEL): gcc 4.1.1
CONTENTS OF $ACE_ROOT/ace/config.h: config-linux.h
CONTENTS OF $ACE_ROOT/include/makeinclude/platform_macros.GNU (unless
this isn't used in this case, e.g., with Microsoft Visual C++):
platform-linux.GNU
AREA/CLASS/EXAMPLE AFFECTED:
Message_Queue_T.cpp / ACE_Message_Queue<ACE_SYNCH_USE> / enqueue_prio
( But I think also:
::enqueue_head,
::enqueue_deadline,
::enqueue_tail )
See DESCRIPTION.
DOES THE PROBLEM AFFECT:
COMPILATION? NO
LINKING? NO
EXECUTION? YES
OTHER (please specify)? NO
SYNOPSIS:
Application sometimes hangs when main Reactor gets more busy.
(e.g. due to significant increasing external events)
Coredump shows:
1) Main Reactor thread is blocking to get the ACE_Message_Queue lock.
2) Multiple threads block on ACE_Select_Reactor_Notify-> which ends up
in
->ACE::send() -> write().
Result is deadlock. We have to kill the application for shutdown.
I've read Sidebar17 of C++ Network Programming Volume 2 but our
situation
is different from what is stated there (no notify called from handle_*
of
event_handler but producer-consumer deadlock).
DESCRIPTION:
The application in short.
-------------------------
The part of our program where this deadlock appears is dealing with
processing messages from an embedded (CAN) network.
There is a thread per CAN message center (hardware communication
channel)
which puts the received message into ACE_Message_Queue's, depending on
who
wants to observe the messages. These are the producer threads.
The message queues belongs to observers who all have registered to get
notified by one Reactor which runs in one main Reactor thread
(consumer).
(they also have registered themselves by the message center threads as
being
interested in the messages).
Problem cause
--------------
What can happen now is that the main Reactor, which is used for many
other
things in our application, temporarily got other work todo, and
therefore
the message center threads may fill up the ACE_Select_Reactor
notification
PIPE. This causes the message center threads (producers) to block on the
(FULL) PIPE write().
The main Reactor thread (consumer), when ready with the other work,
wants to
proceed with handling the pending notifications, and so emptying the
PIPE,
but cannot do this because the current notification code also holds the
message QUEUE lock!
See code description below:
In Message_Queue_T.cpp
======================
ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_prio,
( But also the code reveals:
::enqueue_head,
::enqueue_deadline,
::enqueue_tail )
-------------------------------
DEADLOCK CODE: (Above rev. 46096 until HEAD code)
template <ACE_SYNCH_DECL> int
ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_prio (ACE_Message_Block *new_item,
ACE_Time_Value *timeout)
{
ACE_TRACE ("ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_prio");
int queue_count = 0;
{
ACE_GUARD_RETURN (ACE_SYNCH_MUTEX_T, ace_mon, this->lock_, -1);
if (this->state_ == ACE_Message_Queue_Base::DEACTIVATED)
{
errno = ESHUTDOWN;
return -1;
}
if (this->wait_not_full_cond (ace_mon, timeout) == -1)
return -1;
queue_count = this->enqueue_i (new_item);
if (queue_count == -1)
return -1;
this->notify (); <<<< ERROR DEADLOCK (When blocking on notify in scope
of
buffer
lock...)
}
return queue_count;
}
In above code snippet:
this->notify (); Causes DEADLOCK when blocking on full notification
pipe.
This happens because <notify()> is now called within the scope of the
ACE_GUARD_RETURN.
In older versions of ACE, e.g. 5.3.1, the <notify()> was outside the
scope
of the GUARD and we never had this deadlock.
In SUBVERSION ACE, I can see this has been changed after revision
r.46096 of Message_Queue_T.cpp (ChangeLogTag:Sat Mar 22 11:58:12 2003)
But I don't know why.
See Message_Queue_T.cpp
: enqueue_prio, enqueue_head, enqueue_deadline, enqueue_tail
-------------------------------
NON DEADLOCK CODE (rev. 46096)
...
...
if (queue_count == -1)
return -1;
} // e.o. scope ACE_GUARD_RETURN for queue lock.
this->notify (); // NO deadlock here, notify will unblock as soon PIPE is
// emptied.
return queue_count;
}
-------------------------------
Concerning Coredump parts:
[ consumer ]
Thread 1 (process 16785):
#0 0x009bc5d9 in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#1 0x009b8636 in _L_mutex_lock_85 () from /lib/libpthread.so.0
#2 0x009b817d in pthread_mutex_lock () from /lib/libpthread.so.0
#3 0x00a9f922 in ACE_OS::mutex_lock () from /usr/lib/libACE.so.5.5.6
#4 0x00fd83c8 in ACE_Message_Queue<ACE_MT_SYNCH>::is_empty ()
from /opt/lib/libgcp_datadump.so
#5 0x0142d881 in can::CCanSvcDriverObserver::handle_output ()
from /opt/lib/libcanResourceManager.so.2
#6 0x00ac03f6 in ACE_Select_Reactor_Notify::dispatch_notify ()
from /usr/lib/libACE.so.5.5.6
#7 0x00ac057a in ACE_Select_Reactor_Notify::handle_input ()
from /usr/lib/libACE.so.5.5.6
#8 0x00ac14da in ACE_Select_Reactor_Notify::dispatch_notifications ()
from /usr/lib/libACE.so.5.5.6
#9 0x00a5ef8e in ACE_Asynch_Pseudo_Task::ACE_Asynch_Pseudo_Task$base ()
from /usr/lib/libACE.so.5.5.6
#10 0x00a5f5fd in ACE_Asynch_Pseudo_Task::ACE_Asynch_Pseudo_Task$base ()
from /usr/lib/libACE.so.5.5.6
#11 0x00a65b03 in ACE_OS::gettimeofday () from /usr/lib/libACE.so.5.5.6
#12 0x00abd0d3 in ACE_Reactor::run_reactor_event_loop ()
from /usr/lib/libACE.so.5.5.6
#13 0x0804ae36 in main ()
[ producer ]
Thread 28 (process 17023):
#0 0x009bc8f1 in write () from /lib/libpthread.so.0
#1 0x00a55a3a in ACE::send () from /usr/lib/libACE.so.5.5.6
#2 0x00ac082e in ACE_Select_Reactor_Notify::notify ()
from /usr/lib/libACE.so.5.5.6
#3 0x00a5edcc in ACE_Asynch_Pseudo_Task::ACE_Asynch_Pseudo_Task$base ()
from /usr/lib/libACE.so.5.5.6
#4 0x00abdce6 in ACE_Reactor::notify () from /usr/lib/libACE.so.5.5.6
#5 0x00abeace in ACE_Reactor_Notification_Strategy::notify ()
from /usr/lib/libACE.so.5.5.6
#6 0x0804b42a in ACE_Message_Queue<ACE_MT_SYNCH>::notify ()
#7 0x00fd7fa2 in ACE_Message_Queue<ACE_MT_SYNCH>::enqueue_prio ()
from /opt/lib/libgcp_datadump.so
#8 0x0142d6f3 in can::CCanSvcDriverObserver::update ()
from /opt/lib/libcanResourceManager.so.2
#9 0x0143078d in can::CCanSvcDriver_Base::svc_Read ()
from /opt/lib/libcanResourceManager.so.2
#10 0x0143087f in can::CCanSvcDriverRemoteRequestImpl::svc ()
from /opt/lib/libcanResourceManager.so.2
#11 0x00ad3026 in ACE_Task_Base::svc_run () from /usr/lib/libACE.so.5.5.6
#12 0x00ad39e8 in ACE_Thread_Adapter::invoke_i ()
from /usr/lib/libACE.so.5.5.6
#13 0x00ad3bb6 in ACE_Thread_Adapter::invoke () from
/usr/lib/libACE.so.5.5.6
#14 0x00a67511 in ace_thread_adapter () from /usr/lib/libACE.so.5.5.6
#15 0x009b626a in start_thread () from /lib/libpthread.so.0
#16 0x92fff470 in ?? ()
#17 0x92fff470 in ?? ()
#18 0x92fff470 in ?? ()
#19 0x92fff470 in ?? ()
#20 0x00000000 in ?? ()
[ producer ]
Thread 34 (process 17012):
#0 0x009bc8f1 in write () from /lib/libpthread.so.0
#1 0x00a55a3a in ACE::send () from /usr/lib/libACE.so.5.5.6
#2 0x00ac082e in ACE_Select_Reactor_Notify::notify ()
from /usr/lib/libACE.so.5.5.6
#3 0x00a5edcc in ACE_Asynch_Pseudo_Task::ACE_Asynch_Pseudo_Task$base ()
from /usr/lib/libACE.so.5.5.6
#4 0x00abdce6 in ACE_Reactor::notify () from /usr/lib/libACE.so.5.5.6
#5 0x00abeace in ACE_Reactor_Notification_Strategy::notify ()
from /usr/lib/libACE.so.5.5.6
#6 0x0804b42a in ACE_Message_Queue<ACE_MT_SYNCH>::notify ()
#7 0x00fd7fa2 in ACE_Message_Queue<ACE_MT_SYNCH>::enqueue_prio ()
from /opt/lib/libgcp_datadump.so
#8 0x0142d6f3 in can::CCanSvcDriverObserver::update ()
from /opt/lib/libcanResourceManager.so.2
#9 0x0143062b in can::CCanSvcDriver_Base::svc_Read ()
from /opt/lib/libcanResourceManager.so.2
#10 0x014308cd in can::CCanSvcDriverReadImpl::svc ()
from /opt/lib/libcanResourceManager.so.2
#11 0x00ad3026 in ACE_Task_Base::svc_run () from /usr/lib/libACE.so.5.5.6
#12 0x00ad39e8 in ACE_Thread_Adapter::invoke_i ()
from /usr/lib/libACE.so.5.5.6
#13 0x00ad3bb6 in ACE_Thread_Adapter::invoke () from
/usr/lib/libACE.so.5.5.6
#14 0x00a67511 in ace_thread_adapter () from /usr/lib/libACE.so.5.5.6
#15 0x009b626a in start_thread () from /lib/libpthread.so.0
#16 0x96bff470 in ?? ()
#17 0x96bff470 in ?? ()
#18 0x96bff470 in ?? ()
#19 0x96bff470 in ?? ()
#20 0x00000000 in ?? ()
[ producer ]
Thread 26 (process 17025):
#0 0x009bc8f1 in write () from /lib/libpthread.so.0
#1 0x00a55a3a in ACE::send () from /usr/lib/libACE.so.5.5.6
#2 0x00ac082e in ACE_Select_Reactor_Notify::notify ()
from /usr/lib/libACE.so.5.5.6
#3 0x00a5edcc in ACE_Asynch_Pseudo_Task::ACE_Asynch_Pseudo_Task$base ()
from /usr/lib/libACE.so.5.5.6
#4 0x00abdce6 in ACE_Reactor::notify () from /usr/lib/libACE.so.5.5.6
#5 0x00abeace in ACE_Reactor_Notification_Strategy::notify ()
from /usr/lib/libACE.so.5.5.6
#6 0x0804b42a in ACE_Message_Queue<ACE_MT_SYNCH>::notify ()
#7 0x00fd7fa2 in ACE_Message_Queue<ACE_MT_SYNCH>::enqueue_prio ()
from /opt/lib/libgcp_datadump.so
#8 0x0142d6f3 in can::CCanSvcDriverObserver::update ()
from /opt/lib/libcanResourceManager.so.2
#9 0x0143078d in can::CCanSvcDriver_Base::svc_Read ()
from /opt/lib/libcanResourceManager.so.2
#10 0x0143087f in can::CCanSvcDriverRemoteRequestImpl::svc ()
from /opt/lib/libcanResourceManager.so.2
#11 0x00ad3026 in ACE_Task_Base::svc_run () from /usr/lib/libACE.so.5.5.6
#12 0x00ad39e8 in ACE_Thread_Adapter::invoke_i ()
from /usr/lib/libACE.so.5.5.6
#13 0x00ad3bb6 in ACE_Thread_Adapter::invoke () from
/usr/lib/libACE.so.5.5.6
#14 0x00a67511 in ace_thread_adapter () from /usr/lib/libACE.so.5.5.6
#15 0x009b626a in start_thread () from /lib/libpthread.so.0
#16 0x91bff470 in ?? ()
#17 0x91bff470 in ?? ()
#18 0x91bff470 in ?? ()
#19 0x91bff470 in ?? ()
#20 0x00000000 in ?? ()
REPEAT BY:
See description
SAMPLE FIX/WORKAROUND:
Change ACE code so that <this->notify();> is outside of GUARD scope?
(like it was before in rev. 46096).
[* end of PRF *]
Aside from this deadlock problem:
One thing what also comes up now is that I have to look at the notification
pipe
buffer length too when I want to increase the ACE_Message_Queue size?
(enqueue will block if notification pipe is full).
Or, I have to use #define ACE_HAS_REACTOR_NOTIFICATION_QUEUE and
recompile ACE?
Best regards,
Rudy Pot
Embedded Computer Systems
AWETA G&P b.v
Postbox 17
NL-2630 AA Nootdorp
tel +31 (0)15 3109961
fax +31 (0)15 3107321
mail <mailto:rpot at aweta.nl> rpot at aweta.nl
web www.aweta.com <blocked::http://www.aweta.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.isis.vanderbilt.edu/pipermail/ace-users/attachments/20080227/5424db8d/attachment-0001.html
More information about the Ace-users
mailing list