[tao-users] Notification Service sudden memory leak
Markus Gaugusch
markus at gaugusch.at
Wed May 17 08:11:33 CDT 2017
TAO VERSION: 2.1.6
ACE VERSION: 6.1.6
HOST MACHINE and OPERATING SYSTEM:
RHEL 6.3, using standard RPM
DOES THE PROBLEM AFFECT:
COMPILATION?
no
LINKING?
no
EXECUTION?
yes
SYNOPSIS:
At some point in time, the TAO notification service starts to leak massive
amounts of memory. Later on, it even crashes (probably due to "out of
memory").
DESCRIPTION:
We use the tao-cosnotification service in an environment with mixed C++
(TAO) and Java applications. Most event channels have only one supplier
and one or more consumers, some channels have several suppliers (either
from C++ or Java apps).
After some time, we noticed that the memory usage (shown with top, %MEM)
starts to increase linearly, even though the load (number of events) is
unchanged. Our tests took between 12 and 36 hours to cause the problem to
occur. Several channels are involved and we have no idea yet what the
trigger is.
I started tao-cosnotification with -ORBDebugLevel 4 to gain some insights
and it seems that there is an object with increasing refcount:
object:8a650e40 decr refcount = 4
object:8a650e40 incr refcount = 5
object:8a650e40 incr refcount = 6
object:8a650e40 incr refcount = 6
object:8a650e40 decr refcount = 5
object:8a650e40 decr refcount = 5
object:8a650e40 decr refcount = 4 # for many hours it is stable
[...]
object:8a650e40 decr refcount = 8125 # after some time a bit increased
[...]
object:8a650e40 decr refcount = 8538 # but now growing much faster
[...]
object:8b166010 incr refcount = 212842 # and very high
object:8b166010 incr refcount = 212843
object:8b166010 decr refcount = 212842
object:8b166010 incr refcount = 212843
object:8b166010 incr refcount = 212844
object:8b166010 decr refcount = 212843
I had the suspicion, that consumers that were not properly disconnecting
from their channels could be the cause, but while there is "some" leak, it
stabilizes and doesn't leak further. So this doesn't seem to be the cause.
QUESTIONS:
1.) Is such a problem known and fixed in newer versions?
2.) The object refcount shows 2 increments and only 1 decrement in this
"leaking" state. What could be the cause and how do I find out what this object
"8b166010" is all about?
3.) I wrote a test program to destroy all channels using ec->destroy(), but the
memory was not freed. What could be the cause? Is this expected?
4.) Is there a way to get timestamps into the debug log??!! :) Is ORBDebugLevel
4 enough? The logfile grew to ~30GB during this test, which is not easy to
handle ... I didn't dare to enable more logging :)
5.) What is the policy for disconnected consumers? Is there a way to
"improperly" disconnect and cause objects to be stored until the end?
SAMPLE FIX/WORKAROUND:
Restart of tao-cosnotification is necessary :(
kind regards,
Markus
More information about the tao-users
mailing list