[Ace-users] [ace-bugs] POSIX Proactor destructor deadlocks when there are pending events (fix included)

David Faure dfaure at klaralvdalens-datakonsult.se
Thu Jun 21 06:01:33 CDT 2007


ACE VERSION: 5.5.6

HOST MACHINE and OPERATING SYSTEM: Dual amd64, Linux Kubuntu 7.04

COMPILER NAME AND VERSION (AND PATCHLEVEL): gcc-4.1.2

THE $ACE_ROOT/ace/config.h FILE: symlink to config-linux.h

THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE: symlink to platform_linux.GNU

AREA/CLASS/EXAMPLE AFFECTED: examples/Reactor/Proactor/test_end_event_loop.cpp

DOES THE PROBLEM AFFECT: Execution

SYNOPSIS:
  Closing the POSIX Proactor while it has pending events leads to a deadlock.

DESCRIPTION:
  Trying to quit a program that runs proactor event loops in threads seems difficult.
  If no events come in the program doesn't exit (see separate bug report), but if events
  come in then the proactor fails to close properly and the program doesn't terminate either.

The output from the program when trying to exit it, is:
ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list
 number pending AIO=19
Proactor.cpp:610:(7940 | 132929856):ACE_Proactor::close:implementation couldnt be closed: Invalid argument
[and then it blocks]

Clearly it is intended by the code that close fails:
Breakpoint 7, ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list (this=0x6fd4c0) at POSIX_Proactor.cpp:940
935       // If it is not possible cancel some operation (num_pending > 0 ),
936       // we can do only one thing -report about this
937       // and complain about POSIX implementation.
938       // We know that we have memory leaks, but it is better than
939       // segmentation fault!
940       ACE_DEBUG
941         ((LM_DEBUG,
942           ACE_LIB_TEXT("ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list\n")
943           ACE_LIB_TEXT(" number pending AIO=%d\n"),
944           num_pending));
            [...]
952       return (num_pending == 0 ? 0 : -1);
So this returns -1 due to pending operations.

602     int
603     ACE_Proactor::close (void)
604     {
605       // Close the implementation.
606       if (this->implementation ()->close () == -1)
607         ACE_ERROR_RETURN ((LM_ERROR,
608                            ACE_LIB_TEXT ("%N:%l:(%P | %t):%p\n"),
609                            ACE_LIB_TEXT ("ACE_Proactor::close:implementation couldnt be closed")),
610                           -1);
611
And this return prematurely from close(), without deleting the implementation nor the timer_handler.
A memleak would be acceptable, but in this case this leads to a deadlock:
when the ACE_Proactor destructor finishes, the member ACE_Thread_Manager instance 
(documented to "manage the thread in the Timer_Handler") is deleted, which waits for that thread.
It waits for ever since that thread hasn't been terminated by the "delete this->timer_handler_"
statement that was skipped in close().

#0  0x00002adfeb11f796 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00002adfea14b247 in ACE_Condition_Thread_Mutex::wait (this=0x71247c, mutex=@0x0, abstime=0x1)
    at /d/kdab/src/ACE+TAO-svn/ACE_wrappers/ace/OS_NS_Thread.inl:410
#2  0x00002adfea1ae48c in ACE_Thread_Manager::wait (this=0x7123f8, timeout=0x0, abandon_detached_threads=40,
    use_absolute_time=<value optimized out>) at Thread_Manager.cpp:1694
#3  0x00002adfea1af419 in ACE_Thread_Manager::close (this=0x7123f8) at Thread_Manager.cpp:446
#4  0x00002adfea1af482 in ~ACE_Thread_Manager (this=0x71247c) at Thread_Manager.cpp:460


SAMPLE FIX:

--- Proactor.cpp        (revision 76931)
+++ Proactor.cpp        (working copy)
@@ -356,6 +356,30 @@ ACE_Proactor::ACE_Proactor (ACE_Proactor
 ACE_Proactor::~ACE_Proactor (void)
 {
   this->close ();
+  // Even if close failed, we need to delete everything to avoid a deadlock
+
+  // Delete the implementation.
+  if (this->delete_implementation_)
+    {
+      delete this->implementation ();
+      this->implementation_ = 0;
+    }
+
+  // Delete the timer handler.
+  if (this->timer_handler_)
+    {
+      delete this->timer_handler_;
+      this->timer_handler_ = 0;
+    }
+
+  // Delete the timer queue.
+  if (this->delete_timer_queue_)
+    {
+      delete this->timer_queue_;
+      this->timer_queue_ = 0;
+      this->delete_timer_queue_ = 0;
+    }
+
 }

Now the timer_handler thread exits, the proactor can be deleted and the execution continues, and the program does exit.

This fix duplicates code however, a better fix would probably be to move this code to a new private method
(didn't do that for my tests to reduce recompiles).

I suppose that moving the code out of close() wouldn't be a good idea, the behavior of close()
(a public method) should remain unchanged.

NOTE:
 there is still another bug, it seems some internal thread is terminated after its 
data structures (held by the POSIX reactor) are gone, says valgrind:
==8987== Thread 32:
==8987== Syscall param pread64(buf) points to unaddressable byte(s)
==8987==    at 0x5A0CB88: (within /usr/lib/debug/libc-2.5.so)
==8987==    by 0x60BFE0D: handle_fildes_io (aio_misc.c:523)
==8987==    by 0x5EA82A4: start_thread (pthread_create.c:296)
==8987==    by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
==8987==  Address 0xF471A08 is 0 bytes inside a block of size 4 free'd
==8987==    at 0x4C2000C: operator delete[](void*) (vg_replace_malloc.c:256)
==8987==    by 0x4F14D0D: ACE_Data_Block::~ACE_Data_Block() (Message_Block.cpp:741)
==8987==    by 0x4F15A77: ACE_Data_Block::release(ACE_Lock*) (Message_Block.cpp:820)
==8987==    by 0x4F15B07: ACE_Message_Block::~ACE_Message_Block() (Message_Block.cpp:951)
==8987==    by 0x4F2D763: ACE_AIOCB_Notify_Pipe_Manager::~ACE_AIOCB_Notify_Pipe_Manager() (POSIX_Proactor.cpp:727)
==8987==    by 0x4F2AB55: ACE_POSIX_AIOCB_Proactor::delete_notify_manager() (POSIX_Proactor.cpp:1058)
==8987==    by 0x4F2DA74: ACE_POSIX_AIOCB_Proactor::close() (POSIX_Proactor.cpp:845)
==8987==    by 0x4F31ECB: ACE_Proactor::close() (Proactor.cpp:630)
==8987==    by 0x4F3236B: ACE_Proactor::~ACE_Proactor() (Proactor.cpp:358)
==8987==    by 0x410118: ArbitrationPerThread_Configuration::worker(void*) (main.cpp:533)
==8987==    by 0x5EA82A4: start_thread (pthread_create.c:296)
==8987==    by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
Seems like the thread should be terminated before deleting the notify manager (whatever that is), right?
But, hmm, that thread is all in glibc, not created by ACE?
I'm getting crashes on exit on windows too, with the win32 proactor, but no valgrind there to find out what's happening...
Ideas welcome :)

-- 
David Faure, faure at kde.org, dfaure at klaralvdalens-datakonsult.se
KDE/KOffice developer, Qt consultancy projects
Klarälvdalens Datakonsult AB, Platform-independent software solutions



More information about the Ace-users mailing list