[tao-bugs] Stale connections with BiDirGIOP

Johnny Willemsen jwillemsen at remedy.nl
Tue Nov 17 01:31:21 CST 2015


> How do I get this patch merged to ACE/TAO?

I made some comments to your github pull request, after the needed code
reviews most important is demonstrate that none of our existing unit
tests do fail with your changes applied.

Best regards,

Johnny Willemsen
Remedy IT
> Thanks, Milan.
> Milan Cvetkovic wrote:
>> I have created Bug 4207 in bugzilla, and submitted pull request #157
>> with automated test and bug fix.
>> Hope it makes it,
>> Milan.
>> Milan Cvetkovic wrote:
>>> OK, answering to my own question to some extent...
>>> I narrowed the problem to Transport_Cache_Manager_T, and its use of
>>> Cache_ExtId::index_.
>>> First, how I see it working:
>>> ============================
>>> Transport_Cache_Manager_T uses ACE_Hash_Map_Manager to keep a mapping
>>> between Cache_ExtId and Cache_IntId. In case of IIOP (in my case it
>>> really was SSLIOP, but I doubt there is a difference there), Cache_ExtId
>>> represents IP-ADDR:PORT/index triple for a connection. IP-ADDR:PORT is
>>> the address that Transport connects to, and index is used to allow
>>> multiple connections to same ip/port address. All three values (address,
>>> port, index) are used to calculate hash when stored to
>>> ACE_Hash_Map_Manager
>>> When a new Transport is created, it is registered with cache manager,
>>> and it would create an entry using ip:port:index(0). When another
>>> transport is needed again, Transport_Cache_Manager_T::find_i looks up
>>> for an existing connection, and uses it if it is found and idle.
>>> The problem:
>>> ============
>>> Transport_Cache_Manager_T::find_i assumes that indexes of existing
>>> connections are all consecutive numbers starting with 0. It will try to
>>> lookup Transport with index=1 *only* if index=0 entry for the same
>>> IP:port exists, and if it is busy. If IP:port:index=0 entry is
>>> previously purged from the cache, Transport_Cache_Manager_T::find_i will
>>> never try to use index=1 (or any other index in the cache).
>>> This scenario is exactly what happens with BiDirGIOP when client
>>> disappears from the network, and later reconnects( and re-registers
>>> callback with same IP:PORT) value:
>>> - server caches first callback with IP:addr:index=0
>>> - client reconnects/re-registers
>>> - server caches the second callback with IP:addr:index=1
>>> - eventually, server cleans up cache entry with IP:addr:index=0
>>> - but it is never able to access the entry with IP:addr:index=1
>>> I am not too sure on the impact on regular TAO clients, since I didnt
>>> try it, but I would assume that:
>>> - if index=0 entry is busy, second transport is created
>>> - if index=0 entry's transport is closed, index=1
>>>    entry is purged from cache, and index=1 entry is no
>>>    longer reachable, until index=0 entry for the same IP:PORT is
>>> created.
>>> Potential solutions:
>>> ====================
>>> - I could fix Transport_Cache_Manager_T::unbind_i so it made sure
>>>    that the assumption made in find_i is true: If cache has M elements,
>>>    when removing an entry at index=N (where N is in [0,M), all remaining
>>>    entries for same IP:addr should have consecutive indexes
>>>    in range [0,M-1).
>>> - Alternatively, Transport_Cache_Manager_T can be rewritten
>>>    to actually use multi-hashmap. The existing implementation with
>>>    hash-map and indexes seems inappropriate and sub-optimal.
>>>    Or there is a good reason not to use multi-hash-map, that I am not
>>>    aware of...
>>>    It seems that this would touch more files in TAO though.
>>> I would like to contribute this patch. I would appreciate if someone
>>> could advise me, which direction should I take.
>>> Thanks, Milan.
>>> Milan Cvetkovic wrote:
>>>>      TAO VERSION: 2.2.1
>>>>      ACE VERSION: 6.2.1
>>>>      HOST MACHINE and OPERATING SYSTEM: Debian wheezy on x86_64
>>>>      THE $ACE_ROOT/ace/config.h FILE: config-linux.h
>>>>      THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
>>>> c++11 = 1
>>>> ssl = 1
>>>> include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
>>>>      BiDirGIOP / Transport_Cache_Manager_T / SSLIOP
>>>>          EXECUTION: YES
>>>>     SYNOPSIS: After loss of network connection from a client, server is
>>>> no longer able to invoke callback RPCs, even after client reconnected,
>>>> and resubmitted its callback IOR.
>>>> I have BiDirGIOP setup over SSLIOP. Client is behind firewall router on
>>>> 192.168.12.x network. Client incarnates callback object, listening on
>>>> and port 7771 for ss. Client contacts the server
>>>> over the internet, and it sends the IOR to callback object above.
>>>> Server
>>>> later uses callback object to send various notifications. This setup
>>>> utilizes bidirectional GIOP, over SSLIOP.
>>>> Everything works as desired, until client loses connectivity to server.
>>>> When client re-registers, server adds the new Transport to Transport
>>>> cache manager, however in some scenarios it does not remove the old
>>>> transport, and keeps using it for callbacks, failing on CORBA::TIMEOUT
>>>> My understanding is that Transport_Cache_Manager keeps the hash map
>>>> table of all connections. These connections have the same key, being
>>>> issued from the same IP:port every time (in the example above,
>>>> In some cases, the server does not replace the
>>>> existing transport entry, but adds it with an increased index, and
>>>> keeps
>>>> using index:0 for making callbacks.
>>>> I am attaching the portions of TAO logs. Note that second registration
>>>> binds with index :1. The stale transport is kept with index :0.
>>>> How do I control the content of Transport_Cache_Manager_T. I removed
>>>> the
>>>> references to callback objects from server, however the transport is
>>>> still cached.
>>>> Thanks, Milan.
> _______________________________________________
> tao-bugs mailing list
> tao-bugs at list.isis.vanderbilt.edu
> http://list.isis.vanderbilt.edu/cgi-bin/mailman/listinfo/tao-bugs

More information about the tao-bugs mailing list