[Ace-users] Re: [ace-bugs] POSIX Proactor destructor deadlocks when there arepending events (fix included)

Steve Huston shuston at riverace.com
Fri Jun 22 16:23:24 CDT 2007


Hi David,

Thanks for the report and analysis.
Can you please enter this issue in Bugzilla
(http://deuce.doc.wustl.edu/bugzilla/) for further research and
inclusion in a future ACE version?

Thanks,
-Steve

--
Steve Huston, Riverace Corporation
Would you like ACE to run great on your platform?
See http://www.riverace.com/sponsor.htm


> -----Original Message-----
> From: ace-bugs-bounces at cse.wustl.edu 
> [mailto:ace-bugs-bounces at cse.wustl.edu] On Behalf Of David Faure
> Sent: Thursday, June 21, 2007 7:02 AM
> To: ace-bugs at cs.wustl.edu
> Subject: [ace-bugs] POSIX Proactor destructor deadlocks when 
> there arepending events (fix included)
> 
> 
> ACE VERSION: 5.5.6
> 
> HOST MACHINE and OPERATING SYSTEM: Dual amd64, Linux Kubuntu 7.04
> 
> COMPILER NAME AND VERSION (AND PATCHLEVEL): gcc-4.1.2
> 
> THE $ACE_ROOT/ace/config.h FILE: symlink to config-linux.h
> 
> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE: 
> symlink to platform_linux.GNU
> 
> AREA/CLASS/EXAMPLE AFFECTED: 
> examples/Reactor/Proactor/test_end_event_loop.cpp
> 
> DOES THE PROBLEM AFFECT: Execution
> 
> SYNOPSIS:
>   Closing the POSIX Proactor while it has pending events 
> leads to a deadlock.
> 
> DESCRIPTION:
>   Trying to quit a program that runs proactor event loops in 
> threads seems difficult.
>   If no events come in the program doesn't exit (see separate 
> bug report), but if events
>   come in then the proactor fails to close properly and the 
> program doesn't terminate either.
> 
> The output from the program when trying to exit it, is:
> ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list
>  number pending AIO=19
> Proactor.cpp:610:(7940 | 
> 132929856):ACE_Proactor::close:implementation couldnt be 
> closed: Invalid argument
> [and then it blocks]
> 
> Clearly it is intended by the code that close fails:
> Breakpoint 7, 
> ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list 
> (this=0x6fd4c0) at POSIX_Proactor.cpp:940
> 935       // If it is not possible cancel some operation 
> (num_pending > 0 ),
> 936       // we can do only one thing -report about this
> 937       // and complain about POSIX implementation.
> 938       // We know that we have memory leaks, but it is better
than
> 939       // segmentation fault!
> 940       ACE_DEBUG
> 941         ((LM_DEBUG,
> 942           
> ACE_LIB_TEXT("ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list\n")
> 943           ACE_LIB_TEXT(" number pending AIO=%d\n"),
> 944           num_pending));
>             [...]
> 952       return (num_pending == 0 ? 0 : -1);
> So this returns -1 due to pending operations.
> 
> 602     int
> 603     ACE_Proactor::close (void)
> 604     {
> 605       // Close the implementation.
> 606       if (this->implementation ()->close () == -1)
> 607         ACE_ERROR_RETURN ((LM_ERROR,
> 608                            ACE_LIB_TEXT ("%N:%l:(%P |
%t):%p\n"),
> 609                            ACE_LIB_TEXT 
> ("ACE_Proactor::close:implementation couldnt be closed")),
> 610                           -1);
> 611
> And this return prematurely from close(), without deleting 
> the implementation nor the timer_handler.
> A memleak would be acceptable, but in this case this leads to 
> a deadlock:
> when the ACE_Proactor destructor finishes, the member 
> ACE_Thread_Manager instance 
> (documented to "manage the thread in the Timer_Handler") is 
> deleted, which waits for that thread.
> It waits for ever since that thread hasn't been terminated by 
> the "delete this->timer_handler_"
> statement that was skipped in close().
> 
> #0  0x00002adfeb11f796 in pthread_cond_wait@@GLIBC_2.3.2 () 
> from /lib/libpthread.so.0
> #1  0x00002adfea14b247 in ACE_Condition_Thread_Mutex::wait 
> (this=0x71247c, mutex=@0x0, abstime=0x1)
>     at /d/kdab/src/ACE+TAO-svn/ACE_wrappers/ace/OS_NS_Thread.inl:410
> #2  0x00002adfea1ae48c in ACE_Thread_Manager::wait 
> (this=0x7123f8, timeout=0x0, abandon_detached_threads=40,
>     use_absolute_time=<value optimized out>) at 
> Thread_Manager.cpp:1694
> #3  0x00002adfea1af419 in ACE_Thread_Manager::close 
> (this=0x7123f8) at Thread_Manager.cpp:446
> #4  0x00002adfea1af482 in ~ACE_Thread_Manager (this=0x71247c) 
> at Thread_Manager.cpp:460
> 
> 
> SAMPLE FIX:
> 
> --- Proactor.cpp        (revision 76931)
> +++ Proactor.cpp        (working copy)
> @@ -356,6 +356,30 @@ ACE_Proactor::ACE_Proactor (ACE_Proactor
>  ACE_Proactor::~ACE_Proactor (void)
>  {
>    this->close ();
> +  // Even if close failed, we need to delete everything to 
> avoid a deadlock
> +
> +  // Delete the implementation.
> +  if (this->delete_implementation_)
> +    {
> +      delete this->implementation ();
> +      this->implementation_ = 0;
> +    }
> +
> +  // Delete the timer handler.
> +  if (this->timer_handler_)
> +    {
> +      delete this->timer_handler_;
> +      this->timer_handler_ = 0;
> +    }
> +
> +  // Delete the timer queue.
> +  if (this->delete_timer_queue_)
> +    {
> +      delete this->timer_queue_;
> +      this->timer_queue_ = 0;
> +      this->delete_timer_queue_ = 0;
> +    }
> +
>  }
> 
> Now the timer_handler thread exits, the proactor can be 
> deleted and the execution continues, and the program does exit.
> 
> This fix duplicates code however, a better fix would probably 
> be to move this code to a new private method
> (didn't do that for my tests to reduce recompiles).
> 
> I suppose that moving the code out of close() wouldn't be a 
> good idea, the behavior of close()
> (a public method) should remain unchanged.
> 
> NOTE:
>  there is still another bug, it seems some internal thread is 
> terminated after its 
> data structures (held by the POSIX reactor) are gone, says valgrind:
> ==8987== Thread 32:
> ==8987== Syscall param pread64(buf) points to unaddressable byte(s)
> ==8987==    at 0x5A0CB88: (within /usr/lib/debug/libc-2.5.so)
> ==8987==    by 0x60BFE0D: handle_fildes_io (aio_misc.c:523)
> ==8987==    by 0x5EA82A4: start_thread (pthread_create.c:296)
> ==8987==    by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
> ==8987==  Address 0xF471A08 is 0 bytes inside a block of size 4
free'd
> ==8987==    at 0x4C2000C: operator delete[](void*) 
> (vg_replace_malloc.c:256)
> ==8987==    by 0x4F14D0D: ACE_Data_Block::~ACE_Data_Block() 
> (Message_Block.cpp:741)
> ==8987==    by 0x4F15A77: ACE_Data_Block::release(ACE_Lock*) 
> (Message_Block.cpp:820)
> ==8987==    by 0x4F15B07: 
> ACE_Message_Block::~ACE_Message_Block() (Message_Block.cpp:951)
> ==8987==    by 0x4F2D763: 
> ACE_AIOCB_Notify_Pipe_Manager::~ACE_AIOCB_Notify_Pipe_Manager(
> ) (POSIX_Proactor.cpp:727)
> ==8987==    by 0x4F2AB55: 
> ACE_POSIX_AIOCB_Proactor::delete_notify_manager() 
> (POSIX_Proactor.cpp:1058)
> ==8987==    by 0x4F2DA74: ACE_POSIX_AIOCB_Proactor::close() 
> (POSIX_Proactor.cpp:845)
> ==8987==    by 0x4F31ECB: ACE_Proactor::close() (Proactor.cpp:630)
> ==8987==    by 0x4F3236B: ACE_Proactor::~ACE_Proactor() 
> (Proactor.cpp:358)
> ==8987==    by 0x410118: 
> ArbitrationPerThread_Configuration::worker(void*) (main.cpp:533)
> ==8987==    by 0x5EA82A4: start_thread (pthread_create.c:296)
> ==8987==    by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
> Seems like the thread should be terminated before deleting 
> the notify manager (whatever that is), right?
> But, hmm, that thread is all in glibc, not created by ACE?
> I'm getting crashes on exit on windows too, with the win32 
> proactor, but no valgrind there to find out what's happening...
> Ideas welcome :)
> 
> -- 
> David Faure, faure at kde.org, dfaure at klaralvdalens-datakonsult.se
> KDE/KOffice developer, Qt consultancy projects
> Klarälvdalens Datakonsult AB, Platform-independent software
solutions
> 
> _______________________________________________
> ace-bugs mailing list
> ace-bugs at mail.cse.wustl.edu
> http://mail.cse.wustl.edu/mailman/listinfo/ace-bugs
> 




More information about the Ace-users mailing list