[tao-users] [tao_cosnotification] coredump

Thu Jan 21 07:12:36 CST 2016

> On Jan 21, 2016, at 6:37 AM, Tomek <tomek.w.gran.chaco at gmail.com> wrote:
> 
> How large is the core? How long does it take to reach this end?
> 
> There are two cores for last three days:
> -rw------- 1 root root 10272768 Jan 19 11:02 core-tao_cosnotifica-11-50-51-11986-1453201332
> -rw------- 1 root root 28766208 Jan 19 11:48 core-tao_cosnotifica-11-50-51-1719-1453204100
> 
> Since then tao_cosnotification works stable.
> 

OK. That looks like roughly 10mb and 28mb, not particularly large. And it looks like they occurred within the same hour. Given that latter fact, can you think of any external events related to the host running the notify service?

Are these the only crashes you've seen, or are there a history of them?

The point of failure in your stack is deep in the ORB core, nothing specific to the notify service, and ordinarily just works. It is likely a side-effect of some other problem. What that might be is unknown at this time.

>  
> 
> 
>> warning: core file may not match specified executable file.
>> [New LWP 2038]
>> [New LWP 1719]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `/SA/tao/bin/tao_cosnotification -NoNameSvc -IORoutput /SA/data/fp1/ca/tmp/ntfy.'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  reset_event_loop_thread (this=0x84f3188) at ../tao/Leader_Follower.inl:128
>> 128     ../tao/Leader_Follower.inl: No such file or directory.
>> Missing separate debuginfos, use: debuginfo-install ace+tao-6.3.1-1.i686
>> (gdb) bt full
>> #0  reset_event_loop_thread (this=0x84f3188) at ../tao/Leader_Follower.inl:128
>>         tss = 0x0
> 
> 
> This means the ORB Core was not able to retrieve thread-specific storage. I don't have an explanation as to why not at this time, this is a very unusual situation.
> 
> A common pit fall with the Notify service is the use of defaulted proxies/admins for short-lived suppliers and consumers without destroying the server-side objects when done. In those cases the abandoned objects are effectively leaked, and at some point the notify server hits a resource limit and crashes. Is this something you might be doing?
> 
> Well, in fact and there may be a sequence of several restarts of consumer in short period of time - but not all such sequences cause tao_cosnotification crash. 
> I will check this again but I am pretty sure that all resources are released correctly. What in particular should I pay attention to?
> 

OK. The signature of the pitfall I mentioned is using new_for_consumers() to get an admin and ignoring the id, likewise with the proxy. However for this to be a problem, your core files would be much larger, assuming the core file size reflects the total memory footprint for the process.

Are you using any QoS options?

Are you able to use the ORBDebuglevel settings to get some output? As I said above, this is a very unusual situation, so I really don't know what to look for at this point.

-Phil

--
Phil Mesnier
Principal Engineer & Partner

OCI | WE ARE SOFTWARE ENGINEERS.
tel  +1.314.579.0066 x225
ociweb.com <http://ociweb.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.isis.vanderbilt.edu/pipermail/tao-users/attachments/20160121/d214f76c/attachment.html>