[tao-bugs] Service after some time working idle (1 night) can't process requests
Phil Mesnier
mesnierp at ociweb.com
Fri Jan 13 05:58:07 CST 2017
HI Daniel,
In addition to what Johnny said, do you have log output from the server during the period from 11:38:25 to 15:49:37? I ask because that last client connection is assigned handle #1542 which is quite a large number for a supposedly idle server. Do you have other I/O happening on a regular basis, maybe writing to a file, or connecting to a database or something?
I'm guessing you have a resource leak somewhere, not closing a file or socket.
By default, TAO's reactor uses select() which typically has a limit of 1024 handles. Since the handle number is used as an index to a bit array, a high number such as #1542 will be out of bounds even if you only have a handful of sockets you are interested in. So the server is probably just in its normal run state, but unable to select on the client's connection.
You can try using lsof -p <server pid> which if I'm guessing right will show you 1500 or so open files or sockets.
In fact, I noticed that the first client connection you noted uses handle #28, while previously the server had opened connection to the naming service using handle #9. So 19 handles were consumed in 2.25 minutes. Later you have a 4 hour 11 minute gap with 1514 handles consumed, so in the first case roughly 8 handles per minute were leaked, and roughly 6 per minute in the latter case.
Now if you need all these open resources, perhaps you can switch to the dev_poll reactor rather than the default reactor. See docs/Options.html for information on setting reactor type via the advanced resource factory.
Best regards,
Phil
> On Jan 13, 2017, at 4:32 AM, Johnny Willemsen <jwillemsen at remedy.nl> wrote:
>
> Hi,
>
> Thanks for using the PRF form. Can you attach a debugger and see where the server is looping exactly?
>
> Best regards,
>
> Johnny Willemsen
> Remedy IT
> Postbus 81 | 6930 AB Westervoort | The Netherlands
> http://www.remedy.nl <http://www.remedy.nl/>
> On 01/12/2017 02:32 PM, Daniel Suchodolski wrote:
>> Hi TAO,
>>
>>
>> TAO VERSION: 2.4.1
>> ACE VERSION: 6.4.1
>>
>> HOST MACHINE and OPERATING SYSTEM:
>> Debian 8 (Jessie)
>>
>> COMPILER: g++ (Debian 4.9.2-10) 4.9.2
>>
>>
>> DOES THE PROBLEM AFFECT:
>> EXECUTION? YES
>>
>>
>> SYNOPSIS:
>> CORBA service after some time working idle (1 night) can't process requests. (migration result)
>>
>> DESCRIPTION:
>> At the beginning I want to highlight that we use version of ACE-TAO compiled with option "threads 0". The problem started to happen with many services after migration to newest version of ACE TAO and operating system. Up to now the system worked stable on version 1.2.1/5.2.1 (linux lenny).
>>
>> How it works:
>> Server application registers CORBA service in NamingService.
>> If a client connects shortly after start of the server
>> then the server works fine: the server process request properly.
>>
>> After some time (for example 1 night) a client connects to the Server, but
>> when the client tries to use service then the server hangs up and use 100% of a processor.
>> Client is blocked by the server until the server is killed. Debugging, we found out that the problem is somewhere inside CORBA invocation. Very unclear are debug information seen during loading adatp3-services.svc, but we are not able fully interpret this issue.
>>
>> The services is run with the following ORBParameters:
>> -ORBDottedDecimalAddresses 1
>> -ORBDebug -ORBDebugLevel 10 -ORBVerboseLogging 2 -ORBInitRef NameService=corbaloc::server:30033/NameService
>> -ORBSvcConf adatp3-services.svc
>>
>> and adatp3-services.svc:
>> static Advanced_Resource_Factory "-ORBReactorMaskSignals 0 -ORBInputCDRAllocator null -ORBReactorType select_st -ORBConnec
>> tionCacheLock null"
>> static Server_Strategy_Factory "-ORBAllowReactivationOfSystemids 0"
>> static Client_Strategy_Factory "-ORBTransportMuxStrategy EXCLUSIVE -ORBClientConnectionHandler RW"
>>
>>
>> REPEAT BY:
>> Every Time
>>
>> TAO LOG:
>>
>> The Log of the server is showed below. The log is divided into parts:
>>
>> [Start Server]
>> [Client connect to Server after short time]
>> [Client is connecting after some time]
>>
>
> _______________________________________________
> tao-bugs mailing list
> tao-bugs at list.isis.vanderbilt.edu
> http://list.isis.vanderbilt.edu/cgi-bin/mailman/listinfo/tao-bugs
--
Phil Mesnier
Principal Engineer & Partner
OCI | WE ARE SOFTWARE ENGINEERS.
tel +1.314.579.0066 x225
ociweb.com <http://ociweb.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.isis.vanderbilt.edu/pipermail/tao-bugs/attachments/20170113/5158d899/attachment-0001.html>
More information about the tao-bugs
mailing list