[ace-bugs] infinite loop in TAO_Connector::connect()

Erik Cumps erik.cumps at esaturnus.com
Tue Sep 15 09:19:14 CDT 2015


Hi Phil,

thanks for getting back to me on this issue.

To answer your questions:

Our application is using a round trip timeout policy of 500 ms.

The TAO_Connector::connect() is using the TAO-specific connection timeout?

    (gdb) frame 25
    #25 0xb57f69ab in TAO_Connector::connect (this=0x8f9a298, r=0xbfa11de4,
desc=0xbfa11d44, timeout=0xbfa11e70)
        at Transport_Connector.cpp:613
    613                  if (this->wait_for_transport (r, base_transport,
timeout, false))
    (gdb) print *timeout
    $2 = {static zero = {static zero = <same as static member of an already
seen type>, static max_time = {
          static zero = <same as static member of an already seen type>,
          static max_time = <same as static member of an already seen
type>, tv_ = {tv_sec = 2147483647,
            tv_usec = 999999}}, tv_ = {tv_sec = 0, tv_usec = 0}},
      static max_time = <same as static member of an already seen type>,
tv_ = {tv_sec = 0, tv_usec = 0}}

We are using a thread pool reactor with a single thread in the thread pool.

The main thread of the application is handling the SIGQUIT.

The SIGQUIT signal handler basically abort()s the application:
    #0  0xb52ad387 in *__GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
    #1  0xb52b0702 in *__GI_abort () at abort.c:121
    #2  0xb36e0726 in ?? () from /usr/lib/libGraphicsMagick.so.3
    #3  <signal handler called>

There are four other threads running in the process, two of them related to
lttng and the other two running ACE threads:

    (gdb) info threads
      Id   Target Id         Frame
      5    Thread 0xaf671b70 (LWP 1030) 0xb5353037 in select () at
../sysdeps/unix/syscall-template.S:82
      4    Thread 0xb0699b70 (LWP 1028) syscall () at
../sysdeps/unix/sysv/linux/i386/syscall.S:31
      3    Thread 0xaecffb70 (LWP 1031) 0xb534ab91 in read () at
../sysdeps/unix/syscall-template.S:82
      2    Thread 0xafe99b70 (LWP 1029) syscall () at
../sysdeps/unix/sysv/linux/i386/syscall.S:31
    * 1    Thread 0xb25eb790 (LWP 1027) 0xb52ad387 in *__GI_raise (sig=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64

    (gdb) thread 2
    (gdb) bt
    #0  syscall () at ../sysdeps/unix/sysv/linux/i386/syscall.S:31
    #1  0xb6db41d0 in ?? () from /usr/lib/i386-linux-gnu/liblttng-ust.so.0
    #2  0xb6db4451 in ?? () from /usr/lib/i386-linux-gnu/liblttng-ust.so.0
    #3  0xb526f954 in start_thread (arg=0xafe99b70) at pthread_create.c:304
    #4  0xb5359c1e in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:130

    (gdb) thread 3
    (gdb) bt
    #0  0xb534ab91 in read () at ../sysdeps/unix/syscall-template.S:82
    #1  0x0808095a in read (handle=<optimized out>, len=1, buf=0xaecff0cd)
at /usr/lib/ace/ace/OS_NS_unistd.inl:738
    #2  recv (n=1, buf=0xaecff0cd, this=0x8fa3768) at
/usr/lib/ace/ace/Pipe.inl:143
    #3  read (this=0x8fa3764)
    #4  MyToolNode::svc (this=0x8fa3488)
    #5  0xb56aa3d3 in ACE_Task_Base::svc_run (args=0x8fa36a8) at
Task.cpp:271
    #6  0xb56ac085 in ACE_Thread_Adapter::invoke_i (this=0xaed039e0) at
Thread_Adapter.cpp:161
    #7  0xb56ac1d3 in ACE_Thread_Adapter::invoke (this=0xaed039e0) at
Thread_Adapter.cpp:96
    #8  0xb564366f in ace_thread_adapter (args=0xaed039e0) at
Base_Thread_Adapter.cpp:122
    #9  0xb526f954 in start_thread (arg=0xaecffb70) at pthread_create.c:304
    #10 0xb5359c1e in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:130

    (gdb) thread 4
    (gdb) bt
    #0  syscall () at ../sysdeps/unix/sysv/linux/i386/syscall.S:31
    #1  0xb6db41d0 in ?? () from /usr/lib/i386-linux-gnu/liblttng-ust.so.0
    #2  0xb6db4451 in ?? () from /usr/lib/i386-linux-gnu/liblttng-ust.so.0
    #3  0xb526f954 in start_thread (arg=0xb0699b70) at pthread_create.c:304
    #4  0xb5359c1e in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:130

    (gdb) thread 5
    (gdb) bt
    #0  0xb5353037 in select () at ../sysdeps/unix/syscall-template.S:82
    #1  0xb75ddd21 in OtherTool::doAction (this=this at entry=0xbfa127f0)
    #2  0xb75def18 in OtherTool::svc (this=0xbfa127f0)
    #3  0xb56aa3d3 in ACE_Task_Base::svc_run (args=0xbfa127f0) at
Task.cpp:271
    #4  0xb56ac085 in ACE_Thread_Adapter::invoke_i (this=0x8fa1280) at
Thread_Adapter.cpp:161
    #5  0xb56ac1d3 in ACE_Thread_Adapter::invoke (this=0x8fa1280) at
Thread_Adapter.cpp:96
    #6  0xb564366f in ace_thread_adapter (args=0x8fa1280) at
Base_Thread_Adapter.cpp:122
    #7  0xb526f954 in start_thread (arg=0xaf671b70) at pthread_create.c:304
    #8  0xb5359c1e in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:130

This is the full (sanitized) stack of the main thread:

    #0  0xb52ad387 in *__GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
    #1  0xb52b0702 in *__GI_abort () at abort.c:121
    #2  0xb36e0726 in ?? () from /usr/lib/libGraphicsMagick.so.3
    #3  <signal handler called>
    #4  mmap () at ../sysdeps/unix/sysv/linux/i386/mmap.S:65
    #5  0xb52f59f3 in new_heap (size=147496, top_pad=<optimized out>) at
arena.c:745
    #6  0xb52f68bd in sYSMALLOc (av=<optimized out>, nb=<optimized out>) at
malloc.c:3126
    #7  _int_malloc (av=<optimized out>, bytes=<optimized out>) at
malloc.c:4776
    #8  0xb52f82ec in *__GI___libc_malloc (bytes=16388) at malloc.c:3660
    #9  0xb5466845 in operator new (sz=16388) at
../../../../src/libstdc++-v3/libsupc++/new_opnt.cc:44
    #10 0xb5466903 in operator new[] (sz=16388, nothrow=...) at
../../../../src/libstdc++-v3/libsupc++/new_opvnt.cc:34
    #11 0xb567aa4c in ACE_Notification_Queue::allocate_more_buffers
(this=this at entry=0x8f7fe70)
        at Notification_Queue.cpp:83
    #12 0xb567afa8 in ACE_Notification_Queue::push_new_notification
(this=0x8f7fe70, buffer=...)
        at Notification_Queue.cpp:172
    #13 0xb569b534 in ACE_Select_Reactor_Notify::notify (this=0x8f7fe48,
event_handler=0x0, mask=0, timeout=0x80ede60)
        at Select_Reactor_Base.cpp:697
    #14 0xb563b491 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::notify (this=0x8f7f718, eh=0x0, mask=0,
        timeout=0x80ede60) at
/tmp/buildd/ace-6.0.3/ace/Select_Reactor_T.cpp:213
    #15 0xb563b538 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::wakeup_all_threads (this=0x8f7f718)
        at /tmp/buildd/ace-6.0.3/ace/Select_Reactor_T.inl:196
    #16 0xb563e5d6 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::deactivate (this=0x8f7f718, do_stop=1)
        at /tmp/buildd/ace-6.0.3/ace/Select_Reactor_T.inl:227
    #17 0xb579dfe2 in end_reactor_event_loop (this=<optimized out>) at
/tmp/buildd/ace-6.0.3/ace/Reactor.inl:107
    #18 TAO_Leader_Follower::reset_client_thread (this=this at entry=0x8f7eec8)
at Leader_Follower.cpp:183
    #19 0xb579e530 in ~TAO_LF_Client_Thread_Helper (this=<synthetic
pointer>, __in_chrg=<optimized out>)
        at /tmp/buildd/ace-6.0.3/TAO/tao/Leader_Follower.inl:217
    #20 TAO_Leader_Follower::wait_for_event (this=0x8f7eec8,
event=0x9323c7ec, transport=0x9321f8e0,
        max_wait_time=0xbfa11e70) at Leader_Follower.cpp:423
    #21 0xb57a04ad in TAO_LF_Connect_Strategy::wait_i
(this=this at entry=0x8f9a348,
ev=0x9323c7ec,
        transport=transport at entry=0x9321f8e0,
max_wait_time=max_wait_time at entry=0xbfa11e70)
        at LF_Connect_Strategy.cpp:51
    #22 0xb576983d in TAO_Connect_Strategy::wait (this=0x8f9a348,
t=0x9321f8e0, max_wait_time=0xbfa11e70)
        at Connect_Strategy.cpp:40
    #23 0xb57f4f12 in wait_for_transport (force_wait=false,
timeout=0xbfa11e70, transport=0x9321f8e0, r=0xbfa11de4,
        this=0x8f9a298) at Transport_Connector.cpp:418
    #24 TAO_Connector::wait_for_transport (this=0x8f9a298, r=0xbfa11de4,
transport=0x9321f8e0, timeout=0xbfa11e70,
        force_wait=false) at Transport_Connector.cpp:348
    #25 0xb57f69ab in TAO_Connector::connect (this=0x8f9a298, r=0xbfa11de4,
desc=0xbfa11d44, timeout=0xbfa11e70)
        at Transport_Connector.cpp:613
    #26 0xb57ccb13 in TAO::Profile_Transport_Resolver::try_connect_i
(this=this at entry=0xbfa11de4,
        desc=desc at entry=0xbfa11d44, timeout=timeout at entry=0xbfa11e70,
parallel=parallel at entry=false)
        at Profile_Transport_Resolver.cpp:171
    #27 0xb57cccc3 in TAO::Profile_Transport_Resolver::try_connect
(this=0xbfa11de4, desc=0xbfa11d44,
        timeout=0xbfa11e70) at Profile_Transport_Resolver.cpp:114
    #28 0xb5799ca8 in TAO_Default_Endpoint_Selector::select_endpoint
(this=0x8f7cae8, r=0xbfa11de4,
        max_wait_time=0xbfa11e70) at Invocation_Endpoint_Selectors.cpp:66
    #29 0xb57cc89c in TAO::Profile_Transport_Resolver::resolve
(this=0xbfa11de4, max_time_val=0xbfa11e70)
        at Profile_Transport_Resolver.cpp:85
    #30 0xb579821c in TAO::Invocation_Adapter::invoke_remote_i
(this=0xbfa11fbc, stub=0x93209060, details=...,
        effective_target=..., max_wait_time=@0xbfa11e64: 0xbfa11e70) at
Invocation_Adapter.cpp:239
    #31 0xb5798cc0 in TAO::Invocation_Adapter::invoke_i (this=0xbfa11fbc,
stub=0x93209060, details=...)
        at Invocation_Adapter.cpp:92
    #32 0xb5798076 in TAO::Invocation_Adapter::invoke (this=0xbfa11fbc,
ex_data=0x0, ex_count=0)
        at Invocation_Adapter.cpp:46
    #33 0xb57cf2fb in TAO::Remote_Object_Proxy_Broker::_is_a
(this=0xb5844748, target=0x9320d378,
        type_id=0xb75eb30c "IDL:MyApp/Dispatcher:1.0") at
Remote_Object_Proxy_Broker.cpp:39
    #34 0xb57aa845 in CORBA::Object::_is_a (this=0x9320d378,
type_id=0xb75eb30c "IDL:MyApp/Dispatcher:1.0")
        at Object.cpp:220
    #35 0xb745ca2c in narrow (pbf=0xb746ad10
<MyApp__TAO_Dispatcher_Proxy_Broker_Factory_function(CORBA::Object*)>,
        repo_id=0xb75eb30c "IDL:MyApp/Dispatcher:1.0", obj=0x9320d378) at
/usr/include/tao/Object_T.cpp:27
    #36 MyApp::Dispatcher::_narrow (_tao_objref=0x9320d378) at
generated/DispatcherC.cpp:1361
    #37 0x08089b26 in downcast_objref<MyApp::Dispatcher>
(a_object=0x9320d378)
    #38 0x08089ce3 in lookup_initref<MyApp::Dispatcher> (a_orb=0xfffffff4,
a_name="Dispatcher")
    #39 0x08088f9f in _get_service (a_orb=<optimized out>, this=<optimized
out>)
    #40 MyObject::doMyObjectAction_Unsafe (this=0xaed14bb8)
    #41 0x08089780 in MyObject::doMyObjectAction (this=0xaed14bb8)
    #42 0x08089930 in operator() (p=<optimized out>, this=0xaed14c20) at
/usr/include/boost/bind/mem_fn_template.hpp:49
    #43 operator()<boost::_mfi::mf0<void, MyObject>, boost::_bi::list0>
(f=..., this=0xaed14c28, a=...)
        at /usr/include/boost/bind/bind.hpp:253
    #44 operator() (this=0xaed14c20) at
/usr/include/boost/bind/bind_template.hpp:20
    #45
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, MyObject>,
boost::_bi::list1<boost::_bi::value<MyObject*> > >, void>::invoke
(function_obj_ptr=...)
        at /usr/include/boost/function/function_template.hpp:153
    #46 0xb75cfca3 in operator() (this=<optimized out>) at
/usr/include/boost/function/function_template.hpp:760
    #47 UpcallGuardLock::~UpcallGuardLock (this=0x9321f7b0,
__in_chrg=<optimized out>)
    #48 0xb75cfe59 in checked_delete<UpcallGuardLock> (x=0x9321f7b0) at
/usr/include/boost/checked_delete.hpp:34
    #49 boost::detail::sp_counted_impl_p<UpcallGuardLock>::dispose
(this=0x92ca20e8)
        at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78
    #50 0x080897b8 in release (this=<optimized out>)
        at
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
    #51 ~shared_count (this=<optimized out>, __in_chrg=<optimized out>)
        at /usr/include/boost/smart_ptr/detail/shared_count.hpp:305
    #52 ~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at
/usr/include/boost/smart_ptr/shared_ptr.hpp:164
    #53 MyObject::doMyObjectAction (this=0x92ca20e8)
    #54 0x08089930 in operator() (p=<optimized out>, this=0xaed14be8) at
/usr/include/boost/bind/mem_fn_template.hpp:49
    #55 operator()<boost::_mfi::mf0<void, MyObject>, boost::_bi::list0>
(f=..., this=0xaed14bf0, a=...)
        at /usr/include/boost/bind/bind.hpp:253
    #56 operator() (this=0xaed14be8) at
/usr/include/boost/bind/bind_template.hpp:20
    #57
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, MyObject>,
boost::_bi::list1<boost::_bi::value<MyObject*> > >, void>::invoke
(function_obj_ptr=...)
        at /usr/include/boost/function/function_template.hpp:153
    #58 0xb7574b8a in operator() (this=<optimized out>) at
/usr/include/boost/function/function_template.hpp:760
    #59 AsynchronousTask::handle_exception (this=0xaed14be8)
    #60 0xb569b438 in ACE_Select_Reactor_Notify::dispatch_notify
(this=0x8f7fe48, buffer=...)
        at Select_Reactor_Base.cpp:837
    #61 0xb56b5f26 in ACE_TP_Reactor::handle_notify_events
(this=this at entry=0x8f7f718,
guard=...) at TP_Reactor.cpp:377
    #62 0xb56b6194 in ACE_TP_Reactor::dispatch_i (this=this at entry=0x8f7f718,
max_wait_time=max_wait_time at entry=0x0,
        guard=...) at TP_Reactor.cpp:229
    #63 0xb56b6267 in ACE_TP_Reactor::handle_events (this=0x8f7f718,
max_wait_time=0x0) at TP_Reactor.cpp:169
    #64 0xb57b63ae in handle_events (max_wait_time=0x0, this=0x8f7f2c0) at
/tmp/buildd/ace-6.0.3/ace/Reactor.inl:188
    #65 TAO_ORB_Core::run (this=0x8f7b940, tv=0x0, perform_work=0) at
ORB_Core.cpp:2225
    #66 0xb57b13cd in CORBA::ORB::run (this=this at entry=0x8f7f2d0,
tv=tv at entry=0x0) at ORB.cpp:188
    #67 0xb57b1433 in CORBA::ORB::run (this=0x8f7f2d0) at ORB.cpp:174
    #68 0xb758ec13 in NodeRunner::run (this=0x8fa1380, limit=...)
    #69 0x0807bc7b in run (called_as="MyApp_MyTool", argc=18,
argv=argv at entry=0xbfa12a14, orb=...)
    #70 0x08071b9c in main (argc=18, argv=0xbfa12a14)

Best regards,
Erik


On Mon, Sep 14, 2015 at 4:14 PM, Phil Mesnier <mesnierp at ociweb.com> wrote:

> Hi Eric,
>
> Thank you for the PRF.
>
> First, I assume you are using either the round trip timeout policy or the
> TAO-specific connection timeout. I agree that a while (true) loop seems
> risky, I suspect you've uncovered a deeper issue. How many threads are in
> the ORB thread pool?
> Which thread handles the sigquit? what all does the signal handler do?
>
> Does the stack you shared go deeper? Like maybe there is another connect
> attempt pending?
>
> I've got some ideas but I'd like to see your responses before going
> further.
>
> Best regards,
> Phil
>
> > On Sep 14, 2015, at 6:10 AM, Erik Cumps <erik.cumps at esaturnus.com>
> wrote:
> >
> > Hello,
> >
> > I know this issue is hard to reproduce, but the failure sequence is
> worth investigating,
> > an applciation exit or crash is preferable to an infinite loop.
> >
> > Any insights or comments anyone?
> >
> > Thanks,
> > Erik Cumps
> >
> > On vr, 2015-08-07 at 14:59 +0200, Erik Cumps wrote:
> >>     TAO VERSION: 2.3.0
> >>     ACE VERSION: 6.3.0
> >>
> >>     HOST MACHINE and OPERATING SYSTEM:
> >>         32-bit i686, Linux 3.2.35, debian wheezy
> >>
> >>     TARGET MACHINE and OPERATING SYSTEM, if different from HOST:
> >>         same as HOST
> >>
> >>     COMPILER NAME AND VERSION (AND PATCHLEVEL):
> >>         gcc (Debian 4.7.2-5) 4.7.2
> >>
> >>     THE $ACE_ROOT/ace/config.h FILE:
> >>         // $Id$
> >>
> >>         #ifndef ACE_CONFIG_H_INCLUDED
> >>         #define ACE_CONFIG_H_INCLUDED
> >>         #ifdef __FreeBSD_kernel__
> >>         #include "config-kfreebsd.h"
> >>         #elif defined(__GNU__)
> >>         #include "config-hurd.h"
> >>         #else // assume linux
> >>         /*
> >>          * Macros that were enabled in Debian are stored here.
> >>          *
> >>          * Rationale: those were captured in the generated libraries on
> >>          * compilation; hence the same values must be used when
> >> including
> >>          * ACE+TAO headers, to avoid unexpected results.
> >>          */
> >>
> >>         #if defined(ACE_HAS_IPV6)
> >>         #undef ACE_HAS_IPV6
> >>         #endif
> >>
> >>         #ifndef ACE_USES_IPV4_IPV6_MIGRATION
> >>         #define ACE_USES_IPV4_IPV6_MIGRATION 0
> >>         #endif
> >>
> >>         #ifndef __ACE_INLINE__
> >>         #define __ACE_INLINE__
> >>         #endif
> >>
> >>         #include "config-linux.h"
> >>         #endif // __FreeBSD_version
> >>         #endif /* ACE_CONFIG_H_INCLUDED */
> >>
> >>     THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
> >>         # $Id$
> >>
> >>         debug          = 1
> >>         optimize       = 1
> >>         inline         = 1
> >>
> >>         ssl            = 1
> >>
> >>         xt             = 1
> >>         tk             = 1
> >>         fl             = 1
> >>         fox            = 1
> >>         qt4            = 1
> >>         ace_qt4reactor = 1
> >>
> >>         bzip2          = 1
> >>         lzo1           = 1
> >>         zlib           = 1
> >>
> >>         # Work-around #593225
> >>         ARMEL_TARGET := $(shell echo '__ARMEL__' | $(CC) -E - | tail -n
> >> 1)
> >>         ifeq ($(ARMEL_TARGET),1)
> >>           no_hidden_visibility = 1
> >>         endif
> >>
> >>         include $(ACE_ROOT)/include/makeinclude/platform_linux.GNU
> >>
> >>         PLATFORM_FOX_CPPFLAGS=-I/usr/include/fox-1.6
> >>         PLATFORM_FOX_LIBS=-lFOX-1.6
> >>
> >>     CONTENTS OF
> >> $ACE_ROOT/bin/MakeProjectCreator/config/default.features:
> >>         // Misc
> >>         acexml          = 1
> >>         ace_svcconf     = 1
> >>         ace_token       = 1
> >>         ssl             = 1
> >>         ipv6            = 0
> >>         exceptions      = 1
> >>
> >>         // GUI reactors
> >>         xt              = 1
> >>         ace_xtreactor   = 1
> >>         tao_xtresource  = 1
> >>
> >>         tk              = 1
> >>         ace_tkreactor   = 1
> >>         tao_tkresource  = 1
> >>
> >>         fl              = 1
> >>         ace_flreactor   = 1
> >>         tao_flresource  = 1
> >>
> >>         qt              = 1
> >>         qt4             = 1
> >>         ace_qtreactor   = 1
> >>         tao_qtresource  = 1
> >>
> >>         fox             = 1
> >>         ace_foxreactor  = 1
> >>         tao_foxresource = 1
> >>
> >>         // ZIOP
> >>         zlib          = 1
> >>         zzip          = 1
> >>         bzip2         = 1
> >>         lzo1          = 1
> >>
> >>
> >>     AREA/CLASS/EXAMPLE AFFECTED:
> >>         Transport handling. (Transport_Connector.cpp)
> >>
> >>     DOES THE PROBLEM AFFECT:
> >>         EXECUTION?
> >>
> >>     SYNOPSIS:
> >> A process fails to complete its shutdown because the
> >> TAO_Connector::connect()
> >> method is stuck in an infinite loop.
> >>
> >>     DESCRIPTION:
> >> The system is under heavy load. While the process is stopping its
> >> servants
> >> and is shutting down the ORB, and because of scheduling delays
> >> introduced by
> >> the heavy load, it tries to perform a remote object invocation, which
> >> requires
> >> the setup of a new Transport connection.
> >>
> >> This is handled by the TAO_Connector::connect() method, which states:
> >>    // Stay in this loop until we find:
> >>    // a usable connection, or a timeout happens
> >>
> >> In this particular case the tcm.find_transport() call returns:
> >> TAO::Transport_Cache_Manager::CACHE_FOUND_CONNECTING.
> >>
> >> Which means the following code is executed:
> >>         else if (found ==
> >> TAO::Transport_Cache_Manager::CACHE_FOUND_CONNECTING)
> >>           {
> >>             if (r->blocked_connect ())
> >>               {
> >>                 ...
> >>                 // If wait_for_transport returns no errors, the
> >> base_transport
> >>                 // points to the connection we wait for.
> >>                 if (this->wait_for_transport (r, base_transport,
> >> timeout, false))
> >>                   {
> >>                     // be sure this transport is registered with the
> >> reactor
> >>                     // before using it.
> >>                     if (!base_transport->register_if_necessary ())
> >>                       {
> >>                           base_transport->remove_reference ();
> >>                           return 0;
> >>                       }
> >>                   }
> >>
> >>                 ...
> >>         // In either success or failure cases of wait_for_transport, the
> >>                 // ref counter in corresponding to the ref counter added
> >> by
> >>                 // find_transport is decremented.
> >>                 base_transport->remove_reference ();
> >>               }
> >>             else
> >>               {
> >>                 ...
> >>                 // return the transport in it's current, unconnected
> >> state
> >>                 return base_transport;
> >>               }
> >>           }
> >>
> >> The only way out of the loop in this particular state is if:
> >> * r->blocked_connect() returns false
> >> * wait_for_transport() returns true and the base transport fails to
> >> register
> >> * tcm.find_transport() returns a different result than
> >> CACHE_FOUND_CONNECTING
> >>
> >> In this particular case neither of these conditions are true and the
> >> loop is
> >> therefore not exited. Instead the code keeps invoking
> >> wait_for_transport(),
> >> which incidentally tries to send a notification event to the reactor (so
> >> that
> >> it can stop) and these notification events pile up in a queue because
> >> the
> >> reactor cannot consume them (it is blocked waiting for the remote object
> >> invocation to complete and that itself is blocked waiting for a
> >> transport
> >> connection).
> >>
> >> To give some further indication of the state of the code, here is a
> >> (elided
> >> and simplified) stacktrace, obtained after terminating the process with
> >> a
> >> SIGQUIT signal:
> >>
> >> The first part of the stacktrace contains the part where the code tries
> >> to
> >> notify the reactor that it should stop. As you can see it is pushing the
> >> notification events onto the queue. At this point, the queue contained
> >> already 157297 notifications:
> >> (gdb) print *this
> >> $3 = {<ACE_Copy_Disabled> = {<No data fields>}, alloc_queue_ = {head_ =
> >> 0x8f7feb0, cur_size_ = 157297, allocator_ =
> >> The end_reactor_event_loop() is being called because the has_shutdown()
> >> method of the orb_core_ is true.
> >>
> >> #11 0xb567aa4c in ACE_Notification_Queue::allocate_more_buffers
> >> #12 0xb567afa8 in ACE_Notification_Queue::push_new_notification
> >> #13 0xb569b534 in ACE_Select_Reactor_Notify::notify
> >> #14 0xb563b491 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
> >> >::notify
> >> #15 0xb563b538 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
> >> >::wakeup_all_threads
> >> #16 0xb563e5d6 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
> >> >::deactivate
> >> #17 0xb579dfe2 in end_reactor_event_loop
> >> #18 TAO_Leader_Follower::reset_client_thread
> >> #19 0xb579e530 in ~TAO_LF_Client_Thread_Helper
> >>
> >> The next part contains the TAO_Connector::connect() invocation. From the
> >> size
> >> of the notification queue we can determine that it has already spent a
> >> lot of
> >> time in the loop (at least long enough for more than 150000
> >> notifications)
> >>
> >> #20 TAO_Leader_Follower::wait_for_event
> >> #21 0xb57a04ad in TAO_LF_Connect_Strategy::wait_i
> >> #22 0xb576983d in TAO_Connect_Strategy::wait
> >> #23 0xb57f4f12 in wait_for_transport
> >> #24 TAO_Connector::wait_for_transport
> >> #25 0xb57f69ab in TAO_Connector::connect
> >>
> >> The final part shows that the TAO_Connector::connect() is invoked
> >> because the
> >> process tries to perform a remote object invocation:
> >>
> >> #26 0xb57ccb13 in TAO::Profile_Transport_Resolver::try_connect_i
> >> #27 0xb57cccc3 in TAO::Profile_Transport_Resolver::try_connect
> >> #28 0xb5799ca8 in TAO_Default_Endpoint_Selector::select_endpoint
> >> #29 0xb57cc89c in TAO::Profile_Transport_Resolver::resolve
> >> #30 0xb579821c in TAO::Invocation_Adapter::invoke_remote_i
> >> #31 0xb5798cc0 in TAO::Invocation_Adapter::invoke_i
> >> #32 0xb5798076 in TAO::Invocation_Adapter::invoke
> >> #33 0xb57cf2fb in TAO::Remote_Object_Proxy_Broker::_is_a
> >> #34 0xb57aa845 in CORBA::Object::_is_a
> >> #35 0xb745ca2c in narrow
> >> #36 MyApp::Dispatcher::_narrow
> >> #37 0x08089b26 in downcast_objref<MyApp::Dispatcher>
> >> #38 0x08089ce3 in lookup_initref<MyApp::Dispatcher>
> >> #39 0x08088f9f in _get_service
> >> #40 MyObject::doMyObjectAction_Unsafe
> >> #41 0x08089780 in MyObject::doMyObjectAction
> >>
> >>     REPEAT BY:
> >> This bug is hard to induce.
> >>
> >>     SAMPLE FIX/WORKAROUND:
> >> Would it make sense for TAO_Connector::connect() to verify the time it
> >> spends
> >> waiting for the connection and exit the loop if it detects the timeout?
> >>
> >>
> >>
> >
> > _______________________________________________
> > ace-bugs mailing list
> > ace-bugs at list.isis.vanderbilt.edu
> > http://list.isis.vanderbilt.edu/cgi-bin/mailman/listinfo/ace-bugs
>
> --
> Phil Mesnier
> Principal Software Engineer and Partner,   http://www.ociweb.com
> Object Computing, Inc.                     +01.314.579.0066 x225
>
>
>
>
>


-- 
Erik Cumps
Senior Software Developer

eSATURNUS
T. +32 16 40 12 82
www.esaturnus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.isis.vanderbilt.edu/pipermail/ace-bugs/attachments/20150915/0c9e00e9/attachment-0001.html>


More information about the ace-bugs mailing list