[tao-bugs] Stale connections with BiDirGIOP
Milan Cvetkovic
milan.cvetkovic at mpathix.com
Sat Oct 17 20:11:55 CDT 2015
OK, answering to my own question to some extent...
I narrowed the problem to Transport_Cache_Manager_T, and its use of
Cache_ExtId::index_.
First, how I see it working:
============================
Transport_Cache_Manager_T uses ACE_Hash_Map_Manager to keep a mapping
between Cache_ExtId and Cache_IntId. In case of IIOP (in my case it
really was SSLIOP, but I doubt there is a difference there), Cache_ExtId
represents IP-ADDR:PORT/index triple for a connection. IP-ADDR:PORT is
the address that Transport connects to, and index is used to allow
multiple connections to same ip/port address. All three values (address,
port, index) are used to calculate hash when stored to ACE_Hash_Map_Manager
When a new Transport is created, it is registered with cache manager,
and it would create an entry using ip:port:index(0). When another
transport is needed again, Transport_Cache_Manager_T::find_i looks up
for an existing connection, and uses it if it is found and idle.
The problem:
============
Transport_Cache_Manager_T::find_i assumes that indexes of existing
connections are all consecutive numbers starting with 0. It will try to
lookup Transport with index=1 *only* if index=0 entry for the same
IP:port exists, and if it is busy. If IP:port:index=0 entry is
previously purged from the cache, Transport_Cache_Manager_T::find_i will
never try to use index=1 (or any other index in the cache).
This scenario is exactly what happens with BiDirGIOP when client
disappears from the network, and later reconnects( and re-registers
callback with same IP:PORT) value:
- server caches first callback with IP:addr:index=0
- client reconnects/re-registers
- server caches the second callback with IP:addr:index=1
- eventually, server cleans up cache entry with IP:addr:index=0
- but it is never able to access the entry with IP:addr:index=1
I am not too sure on the impact on regular TAO clients, since I didnt
try it, but I would assume that:
- if index=0 entry is busy, second transport is created
- if index=0 entry's transport is closed, index=1
entry is purged from cache, and index=1 entry is no
longer reachable, until index=0 entry for the same IP:PORT is created.
Potential solutions:
====================
- I could fix Transport_Cache_Manager_T::unbind_i so it made sure
that the assumption made in find_i is true: If cache has M elements,
when removing an entry at index=N (where N is in [0,M), all remaining
entries for same IP:addr should have consecutive indexes
in range [0,M-1).
- Alternatively, Transport_Cache_Manager_T can be rewritten
to actually use multi-hashmap. The existing implementation with
hash-map and indexes seems inappropriate and sub-optimal.
Or there is a good reason not to use multi-hash-map, that I am not
aware of...
It seems that this would touch more files in TAO though.
I would like to contribute this patch. I would appreciate if someone
could advise me, which direction should I take.
Thanks, Milan.
Milan Cvetkovic wrote:
> TAO VERSION: 2.2.1
> ACE VERSION: 6.2.1
>
> HOST MACHINE and OPERATING SYSTEM: Debian wheezy on x86_64
>
> THE $ACE_ROOT/ace/config.h FILE: config-linux.h
>
> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
> c++11 = 1
> ssl = 1
> include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU
>
> AREA/CLASS/EXAMPLE AFFECTED:
> BiDirGIOP / Transport_Cache_Manager_T / SSLIOP
> DOES THE PROBLEM AFFECT:
> EXECUTION: YES
>
> SYNOPSIS: After loss of network connection from a client, server is
> no longer able to invoke callback RPCs, even after client reconnected,
> and resubmitted its callback IOR.
>
> DESCRIPTION:
>
> I have BiDirGIOP setup over SSLIOP. Client is behind firewall router on
> 192.168.12.x network. Client incarnates callback object, listening on
> 192.168.12.113:7770 and port 7771 for ss. Client contacts the server
> over the internet, and it sends the IOR to callback object above. Server
> later uses callback object to send various notifications. This setup
> utilizes bidirectional GIOP, over SSLIOP.
>
> Everything works as desired, until client loses connectivity to server.
> When client re-registers, server adds the new Transport to Transport
> cache manager, however in some scenarios it does not remove the old
> transport, and keeps using it for callbacks, failing on CORBA::TIMEOUT
>
> My understanding is that Transport_Cache_Manager keeps the hash map
> table of all connections. These connections have the same key, being
> issued from the same IP:port every time (in the example above,
> 192.168.12.113:7771). In some cases, the server does not replace the
> existing transport entry, but adds it with an increased index, and keeps
> using index:0 for making callbacks.
>
> I am attaching the portions of TAO logs. Note that second registration
> binds with index :1. The stale transport is kept with index :0.
>
> How do I control the content of Transport_Cache_Manager_T. I removed the
> references to callback objects from server, however the transport is
> still cached.
>
> Thanks, Milan.
More information about the tao-bugs
mailing list