[tao-bugs] Stale connections with BiDirGIOP

Milan Cvetkovic milan.cvetkovic at mpathix.com
Thu Nov 19 09:22:02 CST 2015


Thanks, I needed some time to setup the autobuild/scoreboard.

I posted the results here:

http://support.mpathix.com/tao_scoreboard/index.html

There are 3 builds:
- bug_4207_base:
    on the master branch, before I made any modifications
    15 failures
- bug_4207_exposed:
   at the point where I added Bug_4207 regression test
    15 failures (same failures as in bug_4207_base + Bug_4207,
    MT_Sock_Test passed)
- bug_4207_fixed
   same 15 failures as bug_4207_base, with new Bug_4207_regression test
   passing

Milan.

Johnny Willemsen wrote:
> Hi,
>
>> How do I get this patch merged to ACE/TAO?
>
> I made some comments to your github pull request, after the needed code
> reviews most important is demonstrate that none of our existing unit
> tests do fail with your changes applied.
>
> Best regards,
>
> Johnny Willemsen
> Remedy IT
>>
>> Thanks, Milan.
>>
>> Milan Cvetkovic wrote:
>>> I have created Bug 4207 in bugzilla, and submitted pull request #157
>>> with automated test and bug fix.
>>>
>>> Hope it makes it,
>>>
>>> Milan.
>>>
>>> Milan Cvetkovic wrote:
>>>> OK, answering to my own question to some extent...
>>>>
>>>> I narrowed the problem to Transport_Cache_Manager_T, and its use of
>>>> Cache_ExtId::index_.
>>>>
>>>> First, how I see it working:
>>>> ============================
>>>>
>>>> Transport_Cache_Manager_T uses ACE_Hash_Map_Manager to keep a mapping
>>>> between Cache_ExtId and Cache_IntId. In case of IIOP (in my case it
>>>> really was SSLIOP, but I doubt there is a difference there), Cache_ExtId
>>>> represents IP-ADDR:PORT/index triple for a connection. IP-ADDR:PORT is
>>>> the address that Transport connects to, and index is used to allow
>>>> multiple connections to same ip/port address. All three values (address,
>>>> port, index) are used to calculate hash when stored to
>>>> ACE_Hash_Map_Manager
>>>>
>>>> When a new Transport is created, it is registered with cache manager,
>>>> and it would create an entry using ip:port:index(0). When another
>>>> transport is needed again, Transport_Cache_Manager_T::find_i looks up
>>>> for an existing connection, and uses it if it is found and idle.
>>>>
>>>> The problem:
>>>> ============
>>>> Transport_Cache_Manager_T::find_i assumes that indexes of existing
>>>> connections are all consecutive numbers starting with 0. It will try to
>>>> lookup Transport with index=1 *only* if index=0 entry for the same
>>>> IP:port exists, and if it is busy. If IP:port:index=0 entry is
>>>> previously purged from the cache, Transport_Cache_Manager_T::find_i will
>>>> never try to use index=1 (or any other index in the cache).
>>>>
>>>> This scenario is exactly what happens with BiDirGIOP when client
>>>> disappears from the network, and later reconnects( and re-registers
>>>> callback with same IP:PORT) value:
>>>> - server caches first callback with IP:addr:index=0
>>>> - client reconnects/re-registers
>>>> - server caches the second callback with IP:addr:index=1
>>>> - eventually, server cleans up cache entry with IP:addr:index=0
>>>> - but it is never able to access the entry with IP:addr:index=1
>>>>
>>>> I am not too sure on the impact on regular TAO clients, since I didnt
>>>> try it, but I would assume that:
>>>> - if index=0 entry is busy, second transport is created
>>>> - if index=0 entry's transport is closed, index=1
>>>>     entry is purged from cache, and index=1 entry is no
>>>>     longer reachable, until index=0 entry for the same IP:PORT is
>>>> created.
>>>>
>>>> Potential solutions:
>>>> ====================
>>>> - I could fix Transport_Cache_Manager_T::unbind_i so it made sure
>>>>     that the assumption made in find_i is true: If cache has M elements,
>>>>     when removing an entry at index=N (where N is in [0,M), all remaining
>>>>     entries for same IP:addr should have consecutive indexes
>>>>     in range [0,M-1).
>>>> - Alternatively, Transport_Cache_Manager_T can be rewritten
>>>>     to actually use multi-hashmap. The existing implementation with
>>>>     hash-map and indexes seems inappropriate and sub-optimal.
>>>>     Or there is a good reason not to use multi-hash-map, that I am not
>>>>     aware of...
>>>>     It seems that this would touch more files in TAO though.
>>>>
>>>> I would like to contribute this patch. I would appreciate if someone
>>>> could advise me, which direction should I take.
>>>>
>>>> Thanks, Milan.
>>>>
>>>> Milan Cvetkovic wrote:
>>>>>       TAO VERSION: 2.2.1
>>>>>       ACE VERSION: 6.2.1
>>>>>
>>>>>       HOST MACHINE and OPERATING SYSTEM: Debian wheezy on x86_64
>>>>>
>>>>>       THE $ACE_ROOT/ace/config.h FILE: config-linux.h
>>>>>
>>>>>       THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
>>>>> c++11 = 1
>>>>> ssl = 1
>>>>> include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
>>>>>
>>>>>       AREA/CLASS/EXAMPLE AFFECTED:
>>>>>       BiDirGIOP / Transport_Cache_Manager_T / SSLIOP
>>>>>       DOES THE PROBLEM AFFECT:
>>>>>           EXECUTION: YES
>>>>>
>>>>>      SYNOPSIS: After loss of network connection from a client, server is
>>>>> no longer able to invoke callback RPCs, even after client reconnected,
>>>>> and resubmitted its callback IOR.
>>>>>
>>>>> DESCRIPTION:
>>>>>
>>>>> I have BiDirGIOP setup over SSLIOP. Client is behind firewall router on
>>>>> 192.168.12.x network. Client incarnates callback object, listening on
>>>>> 192.168.12.113:7770 and port 7771 for ss. Client contacts the server
>>>>> over the internet, and it sends the IOR to callback object above.
>>>>> Server
>>>>> later uses callback object to send various notifications. This setup
>>>>> utilizes bidirectional GIOP, over SSLIOP.
>>>>>
>>>>> Everything works as desired, until client loses connectivity to server.
>>>>> When client re-registers, server adds the new Transport to Transport
>>>>> cache manager, however in some scenarios it does not remove the old
>>>>> transport, and keeps using it for callbacks, failing on CORBA::TIMEOUT
>>>>>
>>>>> My understanding is that Transport_Cache_Manager keeps the hash map
>>>>> table of all connections. These connections have the same key, being
>>>>> issued from the same IP:port every time (in the example above,
>>>>> 192.168.12.113:7771). In some cases, the server does not replace the
>>>>> existing transport entry, but adds it with an increased index, and
>>>>> keeps
>>>>> using index:0 for making callbacks.
>>>>>
>>>>> I am attaching the portions of TAO logs. Note that second registration
>>>>> binds with index :1. The stale transport is kept with index :0.
>>>>>
>>>>> How do I control the content of Transport_Cache_Manager_T. I removed
>>>>> the
>>>>> references to callback objects from server, however the transport is
>>>>> still cached.
>>>>>
>>>>> Thanks, Milan.
>>>>
>>>
>>
>> _______________________________________________
>> tao-bugs mailing list
>> tao-bugs at list.isis.vanderbilt.edu
>> http://list.isis.vanderbilt.edu/cgi-bin/mailman/listinfo/tao-bugs



More information about the tao-bugs mailing list