[Ace-users] [ace-bugs] POSIX Proactor destructor deadlocks when
there are pending events (fix included)
David Faure
dfaure at klaralvdalens-datakonsult.se
Thu Jun 21 06:01:33 CDT 2007
ACE VERSION: 5.5.6
HOST MACHINE and OPERATING SYSTEM: Dual amd64, Linux Kubuntu 7.04
COMPILER NAME AND VERSION (AND PATCHLEVEL): gcc-4.1.2
THE $ACE_ROOT/ace/config.h FILE: symlink to config-linux.h
THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE: symlink to platform_linux.GNU
AREA/CLASS/EXAMPLE AFFECTED: examples/Reactor/Proactor/test_end_event_loop.cpp
DOES THE PROBLEM AFFECT: Execution
SYNOPSIS:
Closing the POSIX Proactor while it has pending events leads to a deadlock.
DESCRIPTION:
Trying to quit a program that runs proactor event loops in threads seems difficult.
If no events come in the program doesn't exit (see separate bug report), but if events
come in then the proactor fails to close properly and the program doesn't terminate either.
The output from the program when trying to exit it, is:
ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list
number pending AIO=19
Proactor.cpp:610:(7940 | 132929856):ACE_Proactor::close:implementation couldnt be closed: Invalid argument
[and then it blocks]
Clearly it is intended by the code that close fails:
Breakpoint 7, ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list (this=0x6fd4c0) at POSIX_Proactor.cpp:940
935 // If it is not possible cancel some operation (num_pending > 0 ),
936 // we can do only one thing -report about this
937 // and complain about POSIX implementation.
938 // We know that we have memory leaks, but it is better than
939 // segmentation fault!
940 ACE_DEBUG
941 ((LM_DEBUG,
942 ACE_LIB_TEXT("ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list\n")
943 ACE_LIB_TEXT(" number pending AIO=%d\n"),
944 num_pending));
[...]
952 return (num_pending == 0 ? 0 : -1);
So this returns -1 due to pending operations.
602 int
603 ACE_Proactor::close (void)
604 {
605 // Close the implementation.
606 if (this->implementation ()->close () == -1)
607 ACE_ERROR_RETURN ((LM_ERROR,
608 ACE_LIB_TEXT ("%N:%l:(%P | %t):%p\n"),
609 ACE_LIB_TEXT ("ACE_Proactor::close:implementation couldnt be closed")),
610 -1);
611
And this return prematurely from close(), without deleting the implementation nor the timer_handler.
A memleak would be acceptable, but in this case this leads to a deadlock:
when the ACE_Proactor destructor finishes, the member ACE_Thread_Manager instance
(documented to "manage the thread in the Timer_Handler") is deleted, which waits for that thread.
It waits for ever since that thread hasn't been terminated by the "delete this->timer_handler_"
statement that was skipped in close().
#0 0x00002adfeb11f796 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00002adfea14b247 in ACE_Condition_Thread_Mutex::wait (this=0x71247c, mutex=@0x0, abstime=0x1)
at /d/kdab/src/ACE+TAO-svn/ACE_wrappers/ace/OS_NS_Thread.inl:410
#2 0x00002adfea1ae48c in ACE_Thread_Manager::wait (this=0x7123f8, timeout=0x0, abandon_detached_threads=40,
use_absolute_time=<value optimized out>) at Thread_Manager.cpp:1694
#3 0x00002adfea1af419 in ACE_Thread_Manager::close (this=0x7123f8) at Thread_Manager.cpp:446
#4 0x00002adfea1af482 in ~ACE_Thread_Manager (this=0x71247c) at Thread_Manager.cpp:460
SAMPLE FIX:
--- Proactor.cpp (revision 76931)
+++ Proactor.cpp (working copy)
@@ -356,6 +356,30 @@ ACE_Proactor::ACE_Proactor (ACE_Proactor
ACE_Proactor::~ACE_Proactor (void)
{
this->close ();
+ // Even if close failed, we need to delete everything to avoid a deadlock
+
+ // Delete the implementation.
+ if (this->delete_implementation_)
+ {
+ delete this->implementation ();
+ this->implementation_ = 0;
+ }
+
+ // Delete the timer handler.
+ if (this->timer_handler_)
+ {
+ delete this->timer_handler_;
+ this->timer_handler_ = 0;
+ }
+
+ // Delete the timer queue.
+ if (this->delete_timer_queue_)
+ {
+ delete this->timer_queue_;
+ this->timer_queue_ = 0;
+ this->delete_timer_queue_ = 0;
+ }
+
}
Now the timer_handler thread exits, the proactor can be deleted and the execution continues, and the program does exit.
This fix duplicates code however, a better fix would probably be to move this code to a new private method
(didn't do that for my tests to reduce recompiles).
I suppose that moving the code out of close() wouldn't be a good idea, the behavior of close()
(a public method) should remain unchanged.
NOTE:
there is still another bug, it seems some internal thread is terminated after its
data structures (held by the POSIX reactor) are gone, says valgrind:
==8987== Thread 32:
==8987== Syscall param pread64(buf) points to unaddressable byte(s)
==8987== at 0x5A0CB88: (within /usr/lib/debug/libc-2.5.so)
==8987== by 0x60BFE0D: handle_fildes_io (aio_misc.c:523)
==8987== by 0x5EA82A4: start_thread (pthread_create.c:296)
==8987== by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
==8987== Address 0xF471A08 is 0 bytes inside a block of size 4 free'd
==8987== at 0x4C2000C: operator delete[](void*) (vg_replace_malloc.c:256)
==8987== by 0x4F14D0D: ACE_Data_Block::~ACE_Data_Block() (Message_Block.cpp:741)
==8987== by 0x4F15A77: ACE_Data_Block::release(ACE_Lock*) (Message_Block.cpp:820)
==8987== by 0x4F15B07: ACE_Message_Block::~ACE_Message_Block() (Message_Block.cpp:951)
==8987== by 0x4F2D763: ACE_AIOCB_Notify_Pipe_Manager::~ACE_AIOCB_Notify_Pipe_Manager() (POSIX_Proactor.cpp:727)
==8987== by 0x4F2AB55: ACE_POSIX_AIOCB_Proactor::delete_notify_manager() (POSIX_Proactor.cpp:1058)
==8987== by 0x4F2DA74: ACE_POSIX_AIOCB_Proactor::close() (POSIX_Proactor.cpp:845)
==8987== by 0x4F31ECB: ACE_Proactor::close() (Proactor.cpp:630)
==8987== by 0x4F3236B: ACE_Proactor::~ACE_Proactor() (Proactor.cpp:358)
==8987== by 0x410118: ArbitrationPerThread_Configuration::worker(void*) (main.cpp:533)
==8987== by 0x5EA82A4: start_thread (pthread_create.c:296)
==8987== by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
Seems like the thread should be terminated before deleting the notify manager (whatever that is), right?
But, hmm, that thread is all in glibc, not created by ACE?
I'm getting crashes on exit on windows too, with the win32 proactor, but no valgrind there to find out what's happening...
Ideas welcome :)
--
David Faure, faure at kde.org, dfaure at klaralvdalens-datakonsult.se
KDE/KOffice developer, Qt consultancy projects
Klarälvdalens Datakonsult AB, Platform-independent software solutions
More information about the Ace-users
mailing list