[Ace-users] [ace-bugs] [ACE_Message_Queue] notify PIPE block causesSelect_Reactor deadlock.
gpy at altair.com
Thu Feb 28 13:09:26 CST 2008
Doug, thanks for the answer. In my experience the deadlock was not
simple to reproduce, at least with my application. For the PC, it
seemed to occur only on Vista (not on Windows XP) and while doing
something specific. The whole thing worked fine 99% of the time.
It did occur on one unix platform as well, I forgot which one.
My main app has two threads, and talks to a remote server. The socket
i/o to the remote server is done in thread2. It seemed that the problem
occurred when thread1 was enqueuing lots of messages to thread2, and
when thread2 was too busy with socket i/o with the remote server to
dequeue the messages from thread1.
I did try changing my code slightly to avoid the deadlock, but to
In the end I patched ACE (as you suggested).
I don't think it is trivial to create a test case reproducing
the deadlock. Surely doable but might require some time, which I am
lacking at the moment.
From: schmidt at dre.vanderbilt.edu [mailto:schmidt at dre.vanderbilt.edu]
Sent: Wednesday, February 27, 2008 5:17 PM
To: Greg Popovitch
Cc: Johnny Willemsen; Rudy Pot; ace-bugs at cs.wustl.edu
Subject: Re: [ace-bugs] [ACE_Message_Queue] notify PIPE block
> Thanks for the answer, and thanks for the great work with ACE!
You are very welcome!
> I agree that having a non-regression test case for the deadlock
> problem is useful. If there had been one, the 2003 fix restoring the
> deadlock problem would probably not have happened.
> However, having a test case for the deadlock issue would not help you
> create a fix avoiding the race conditions. If it was me, I'd do the
> "obvious" fix for the deadlock (after running all the available tests)
> and deal with the race conditions correctly if and when when they
The problem, of course, is that race conditions are hard to detect, so a
program may appear to work during simpe smoke testing, when in fact it
has a bug that appears in production. In contrast, the deadlock
situation manifests itself fairly easily, so it can be detected and
addressed, e.g., by designing the program to avoid a deadlock.
> Otherwise, we put ourselves in a deadlock. We are willing to fix this
> only if we can make sure we don't create race conditions, but we are
> not able to verify this because we don't have a test case for those
> race conditions.
Right, but hopefully you can understand why we don't want to apply a fix
that is known to cause problems since it will break a lot of production
code in subtle and pernicious ways.
> IMHO, the immediate problem is that we don't have a test case for
> reproducing the "race conditions"
How about we do the following:
1. Add a regression test for the current deadlock case.
2. Add suggested fixes for the current problem to bugzilla so they don't
3. Try to create a regression test that demonstrates the race
4. If we get #3 then we can enhance the solutions for #2 to avoid both
problems #1 and #3.
In the meantime, you can patch your code locally to avoid the deadlock -
such is the power of open-source ;-)
More information about the Ace-users