[Ace-users] [ace-bugs] trying to fix an ACE hang
Greg Popovitch
gpy at altair.com
Fri Nov 30 09:49:38 CST 2007
Thanks Steve for the answer. I understand all too well that this is
sensitive code.
I'll try to describe more precisely what happens:
In my enqueuing thread:
----------------------
1/ enqueue a message for the other thread on my own message queue
------------------------------------------------------------------
if (_msg_queue) _msg_queue->enqueue_head(mb);
2/ in ACE DLL, enqueue and notify reactor
-----------------------------------------
template <ACE_SYNCH_DECL> int
ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_head (ACE_Message_Block
*new_item,
ACE_Time_Value *timeout)
{
ACE_TRACE ("ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_head");
int queue_count = 0;
{
ACE_GUARD_RETURN (ACE_SYNCH_MUTEX_T, ace_mon, this->lock_, -1);
if (this->state_ == ACE_Message_Queue_Base::DEACTIVATED)
{
errno = ESHUTDOWN;
return -1;
}
if (this->wait_not_full_cond (ace_mon, timeout) == -1)
return -1;
queue_count = this->enqueue_head_i (new_item);
if (queue_count == -1)
return -1;
=> this->notify ();
}
return queue_count;
}
3/ notification proceeds through, nothing interesting there
-----------------------------------------------------------
ACEd.dll!ACE_Reactor_Notification_Strategy::notify() Line 29
C++
ACEd.dll!ACE_Reactor::notify(ACE_Event_Handler *
event_handler=0x000000002971bc70, unsigned long mask=1, ACE_Time_Value *
tv=0x00
ACEd.dll!ACE_WFMO_Reactor::notify(ACE_Event_Handler *
event_handler=0x000000002971bc70, unsigned long mask=1, ACE_Time_Value *
timeo
4/ getting into WFMO reactor which enqueues on its own message queue
--------------------------------------------------------------------
int
ACE_WFMO_Reactor_Notify::notify (ACE_Event_Handler *event_handler,
ACE_Reactor_Mask mask,
ACE_Time_Value *timeout)
{
if (event_handler != 0)
{
ACE_Message_Block *mb = 0;
ACE_NEW_RETURN (mb,
ACE_Message_Block (sizeof
(ACE_Notification_Buffer)),
-1);
ACE_Notification_Buffer *buffer =
(ACE_Notification_Buffer *) mb->base ();
buffer->eh_ = event_handler;
buffer->mask_ = mask;
// Convert from relative time to absolute time by adding the
// current time of day. This is what <ACE_Message_Queue>
// expects.
if (timeout != 0)
*timeout += timer_queue_->gettimeofday ();
=> if (this->message_queue_.enqueue_tail
(mb, timeout) == -1)
{
mb->release ();
return -1;
}
event_handler->add_reference ();
}
return this->wakeup_one_thread_.signal ();
}
5/ enqueue blocks because queue is full
---------------------------------------
message_queue_ {head_=0x000000006deaf1d0
tail_=0x000000007a5826b0 low_water_mark_=16384 ...}
ACE_Message_Queue<ACE_MT_SYNCH> ACE_Message_Queue_Base
{state_=1 } ACE_Message_Queue_Base
head_ 0x000000006deaf1d0 {rd_ptr_=0 wr_ptr_=0
priority_=0 ...} ACE_Message_Block *
tail_ 0x000000007a5826b0 {rd_ptr_=0 wr_ptr_=0
priority_=0 ...} ACE_Message_Block *
low_water_mark_ 16384 unsigned __int64
high_water_mark_ 16384 unsigned __int64
cur_bytes_ 16384 unsigned __int64
cur_length_ 0 unsigned __int64
cur_count_ 1024 unsigned __int64
template <ACE_SYNCH_DECL> int
ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_tail (ACE_Message_Block
*new_item,
ACE_Time_Value *timeout)
{
ACE_TRACE ("ACE_Message_Queue<ACE_SYNCH_USE>::enqueue_tail");
int queue_count = 0;
{
ACE_GUARD_RETURN (ACE_SYNCH_MUTEX_T, ace_mon, this->lock_, -1);
if (this->state_ == ACE_Message_Queue_Base::DEACTIVATED)
{
errno = ESHUTDOWN;
return -1;
}
=> if (this->wait_not_full_cond (ace_mon, timeout) == -1)
return -1;
queue_count = this->enqueue_tail_i (new_item);
if (queue_count == -1)
return -1;
this->notify ();
}
return queue_count;
}
greg
-----Original Message-----
From: Steve Huston [mailto:shuston at riverace.com]
Sent: Friday, November 30, 2007 10:24 AM
To: Greg Popovitch; 'Douglas C. Schmidt'
Cc: ace-bugs at cse.wustl.edu
Subject: RE: [ace-bugs] trying to fix an ACE hang
Hi Greg,
> Steve, I have not spend much time to really understand ACE
> internals, so what follows is to be taken with a grain of salt.
Ok.
> 1/ <it's safe to just drop the write if the pipe is full>
>
> I would have thought that this would cause the loss of a
> message. If not
> then this really means that the messages in the WFMO_Reactor's
message
> queue don't really matter, and as long as this queue is not
> empty, there
> is no need to enqueue another message, doesn't it?
I said it's ok to drop the write to the pipe if the notifications are
queued externally to the pipe. I'd have to check the code again, but I
believe that's how the ACE_WFMO_Reactor notification mechanism works.
> 2/ it still seems to me illogical that we have to hold a lock on a
> message queue to execute a notify on the reactor. Is it because we
are
> worried that someone will delete the message queue before the
> notify is executed?
The notify() was moved back under the guard protection because of some
sort of shutdown race (the ChangeLog doesn't describe more than that).
It does seem illogical (and wrong) on the face, but without studying
and analysing the situation it's hard to say what the whole story is.
> I have had termination problems as well and I think it is a
> weakness of ACE. Maybe there could be be a reference counting
> mechanism
> on objects, so they only go away when no-one references them
anymore?
On some objects, that's already available. Probably the best way to
proceed is to get hard details on a particular termination problem
scenario and then ask the list for more assistance. Or take advantage
of support services ;-)
Best regards,
-Steve
--
Steve Huston, Riverace Corporation
Want to take ACE training on YOUR schedule?
See http://www.riverace.com/training.htm
More information about the Ace-users
mailing list