[Ace-users] Re: [ace-bugs] POSIX Proactor destructor deadlocks when
there arepending events (fix included)
Steve Huston
shuston at riverace.com
Fri Jun 22 16:23:24 CDT 2007
Hi David,
Thanks for the report and analysis.
Can you please enter this issue in Bugzilla
(http://deuce.doc.wustl.edu/bugzilla/) for further research and
inclusion in a future ACE version?
Thanks,
-Steve
--
Steve Huston, Riverace Corporation
Would you like ACE to run great on your platform?
See http://www.riverace.com/sponsor.htm
> -----Original Message-----
> From: ace-bugs-bounces at cse.wustl.edu
> [mailto:ace-bugs-bounces at cse.wustl.edu] On Behalf Of David Faure
> Sent: Thursday, June 21, 2007 7:02 AM
> To: ace-bugs at cs.wustl.edu
> Subject: [ace-bugs] POSIX Proactor destructor deadlocks when
> there arepending events (fix included)
>
>
> ACE VERSION: 5.5.6
>
> HOST MACHINE and OPERATING SYSTEM: Dual amd64, Linux Kubuntu 7.04
>
> COMPILER NAME AND VERSION (AND PATCHLEVEL): gcc-4.1.2
>
> THE $ACE_ROOT/ace/config.h FILE: symlink to config-linux.h
>
> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
> symlink to platform_linux.GNU
>
> AREA/CLASS/EXAMPLE AFFECTED:
> examples/Reactor/Proactor/test_end_event_loop.cpp
>
> DOES THE PROBLEM AFFECT: Execution
>
> SYNOPSIS:
> Closing the POSIX Proactor while it has pending events
> leads to a deadlock.
>
> DESCRIPTION:
> Trying to quit a program that runs proactor event loops in
> threads seems difficult.
> If no events come in the program doesn't exit (see separate
> bug report), but if events
> come in then the proactor fails to close properly and the
> program doesn't terminate either.
>
> The output from the program when trying to exit it, is:
> ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list
> number pending AIO=19
> Proactor.cpp:610:(7940 |
> 132929856):ACE_Proactor::close:implementation couldnt be
> closed: Invalid argument
> [and then it blocks]
>
> Clearly it is intended by the code that close fails:
> Breakpoint 7,
> ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list
> (this=0x6fd4c0) at POSIX_Proactor.cpp:940
> 935 // If it is not possible cancel some operation
> (num_pending > 0 ),
> 936 // we can do only one thing -report about this
> 937 // and complain about POSIX implementation.
> 938 // We know that we have memory leaks, but it is better
than
> 939 // segmentation fault!
> 940 ACE_DEBUG
> 941 ((LM_DEBUG,
> 942
> ACE_LIB_TEXT("ACE_POSIX_AIOCB_Proactor::delete_result_aiocb_list\n")
> 943 ACE_LIB_TEXT(" number pending AIO=%d\n"),
> 944 num_pending));
> [...]
> 952 return (num_pending == 0 ? 0 : -1);
> So this returns -1 due to pending operations.
>
> 602 int
> 603 ACE_Proactor::close (void)
> 604 {
> 605 // Close the implementation.
> 606 if (this->implementation ()->close () == -1)
> 607 ACE_ERROR_RETURN ((LM_ERROR,
> 608 ACE_LIB_TEXT ("%N:%l:(%P |
%t):%p\n"),
> 609 ACE_LIB_TEXT
> ("ACE_Proactor::close:implementation couldnt be closed")),
> 610 -1);
> 611
> And this return prematurely from close(), without deleting
> the implementation nor the timer_handler.
> A memleak would be acceptable, but in this case this leads to
> a deadlock:
> when the ACE_Proactor destructor finishes, the member
> ACE_Thread_Manager instance
> (documented to "manage the thread in the Timer_Handler") is
> deleted, which waits for that thread.
> It waits for ever since that thread hasn't been terminated by
> the "delete this->timer_handler_"
> statement that was skipped in close().
>
> #0 0x00002adfeb11f796 in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> #1 0x00002adfea14b247 in ACE_Condition_Thread_Mutex::wait
> (this=0x71247c, mutex=@0x0, abstime=0x1)
> at /d/kdab/src/ACE+TAO-svn/ACE_wrappers/ace/OS_NS_Thread.inl:410
> #2 0x00002adfea1ae48c in ACE_Thread_Manager::wait
> (this=0x7123f8, timeout=0x0, abandon_detached_threads=40,
> use_absolute_time=<value optimized out>) at
> Thread_Manager.cpp:1694
> #3 0x00002adfea1af419 in ACE_Thread_Manager::close
> (this=0x7123f8) at Thread_Manager.cpp:446
> #4 0x00002adfea1af482 in ~ACE_Thread_Manager (this=0x71247c)
> at Thread_Manager.cpp:460
>
>
> SAMPLE FIX:
>
> --- Proactor.cpp (revision 76931)
> +++ Proactor.cpp (working copy)
> @@ -356,6 +356,30 @@ ACE_Proactor::ACE_Proactor (ACE_Proactor
> ACE_Proactor::~ACE_Proactor (void)
> {
> this->close ();
> + // Even if close failed, we need to delete everything to
> avoid a deadlock
> +
> + // Delete the implementation.
> + if (this->delete_implementation_)
> + {
> + delete this->implementation ();
> + this->implementation_ = 0;
> + }
> +
> + // Delete the timer handler.
> + if (this->timer_handler_)
> + {
> + delete this->timer_handler_;
> + this->timer_handler_ = 0;
> + }
> +
> + // Delete the timer queue.
> + if (this->delete_timer_queue_)
> + {
> + delete this->timer_queue_;
> + this->timer_queue_ = 0;
> + this->delete_timer_queue_ = 0;
> + }
> +
> }
>
> Now the timer_handler thread exits, the proactor can be
> deleted and the execution continues, and the program does exit.
>
> This fix duplicates code however, a better fix would probably
> be to move this code to a new private method
> (didn't do that for my tests to reduce recompiles).
>
> I suppose that moving the code out of close() wouldn't be a
> good idea, the behavior of close()
> (a public method) should remain unchanged.
>
> NOTE:
> there is still another bug, it seems some internal thread is
> terminated after its
> data structures (held by the POSIX reactor) are gone, says valgrind:
> ==8987== Thread 32:
> ==8987== Syscall param pread64(buf) points to unaddressable byte(s)
> ==8987== at 0x5A0CB88: (within /usr/lib/debug/libc-2.5.so)
> ==8987== by 0x60BFE0D: handle_fildes_io (aio_misc.c:523)
> ==8987== by 0x5EA82A4: start_thread (pthread_create.c:296)
> ==8987== by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
> ==8987== Address 0xF471A08 is 0 bytes inside a block of size 4
free'd
> ==8987== at 0x4C2000C: operator delete[](void*)
> (vg_replace_malloc.c:256)
> ==8987== by 0x4F14D0D: ACE_Data_Block::~ACE_Data_Block()
> (Message_Block.cpp:741)
> ==8987== by 0x4F15A77: ACE_Data_Block::release(ACE_Lock*)
> (Message_Block.cpp:820)
> ==8987== by 0x4F15B07:
> ACE_Message_Block::~ACE_Message_Block() (Message_Block.cpp:951)
> ==8987== by 0x4F2D763:
> ACE_AIOCB_Notify_Pipe_Manager::~ACE_AIOCB_Notify_Pipe_Manager(
> ) (POSIX_Proactor.cpp:727)
> ==8987== by 0x4F2AB55:
> ACE_POSIX_AIOCB_Proactor::delete_notify_manager()
> (POSIX_Proactor.cpp:1058)
> ==8987== by 0x4F2DA74: ACE_POSIX_AIOCB_Proactor::close()
> (POSIX_Proactor.cpp:845)
> ==8987== by 0x4F31ECB: ACE_Proactor::close() (Proactor.cpp:630)
> ==8987== by 0x4F3236B: ACE_Proactor::~ACE_Proactor()
> (Proactor.cpp:358)
> ==8987== by 0x410118:
> ArbitrationPerThread_Configuration::worker(void*) (main.cpp:533)
> ==8987== by 0x5EA82A4: start_thread (pthread_create.c:296)
> ==8987== by 0x5A1C61C: clone (in /usr/lib/debug/libc-2.5.so)
> Seems like the thread should be terminated before deleting
> the notify manager (whatever that is), right?
> But, hmm, that thread is all in glibc, not created by ACE?
> I'm getting crashes on exit on windows too, with the win32
> proactor, but no valgrind there to find out what's happening...
> Ideas welcome :)
>
> --
> David Faure, faure at kde.org, dfaure at klaralvdalens-datakonsult.se
> KDE/KOffice developer, Qt consultancy projects
> Klarälvdalens Datakonsult AB, Platform-independent software
solutions
>
> _______________________________________________
> ace-bugs mailing list
> ace-bugs at mail.cse.wustl.edu
> http://mail.cse.wustl.edu/mailman/listinfo/ace-bugs
>
More information about the Ace-users
mailing list