[ciao-users] DAnCE: Plan Launcher throws CORBA MARSHAL exception
Haidinger, Holger
haidinger at redlogix.de
Fri Mar 23 12:02:26 CDT 2018
Hi Will,
first of all, thank you very much for your quick response!
We turned up the logging by setting the following environment variables:
set DANCE_LOG_LEVEL=10
set DANCE_TRACE_ENABLE=1
With these settings the MARSHAL exception occurred always (on faster machines).
Did you mean these settings or did we miss something?
Best regards
Holger
-----Ursprüngliche Nachricht-----
Von: ciao-users [mailto:ciao-users-bounces at list.isis.vanderbilt.edu] Im Auftrag von William R. Otte
Gesendet: Mittwoch, 21. März 2018 22:59
An: CIAO Users Mailing List
Betreff: Re: [ciao-users] DAnCE: Plan Launcher throws CORBA MARSHAL exception
Hi Holger -
Thanks for using the PRF.
Bearing in mind that I’m about six years removed from the last time I worked on DAnCE:
It looks to me like the error could be occurring on the node level (e.g.
Node/Locality manager), not the domain level (execution manager). Have you looked in detail at those logs? Have you turned up the logging to debug level for both the node and domain infrastructure?
It would be helpful to attach those logs.
/-Will
On 21 Mar 2018, at 8:09, Haidinger, Holger wrote:
> DAnCE VERSION: 1.2.3
> TAO VERSION : 2.2.3
> ACE VERSION : 6.2.3
>
> HOST MACHINE and OPERATING SYSTEM:
> Intel(R) Core(TM) i7-6700HQ
> Microsoft Windows 10 Professional Version 1709
> Windows Socket 2
>
> TARGET MACHINE and OPERATING SYSTEM, if different from HOST:
> Intel Mobile Core 2 Duo T5600
> Microsoft Windows XP Professional Service Pack 3
> COMPILER NAME AND VERSION (AND PATCHLEVEL):
> Microsoft Visual Studio 2010 Version 10.0.40219.1 SP1Rel
>
> THE $ACE_ROOT/ace/config.h FILE: #include "ace/config-win32.h"
>
> THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
> Not used due to Microsoft Visual C++
>
> CONTENTS OF
> $ACE_ROOT/bin/MakeProjectCreator/config/default.features
> (used by MPC when you generate your own makefiles): Not used
>
> AREA/CLASS/EXAMPLE AFFECTED:
> No module failed to compile.
>
> DOES THE PROBLEM AFFECT: EXECUTION
> COMPILATION? No
> LINKING? No
> EXECUTION? Yes
> OTHER (please specify)?
>
> SYNOPSIS:
> Plan Launcher throws CORBA MARSHAL exception for large deployments.
>
> DESCRIPTION:
> We have been developed a framework of CCM components. Based on the
> various needs of our applications we create deployment plans which are
> making use of the components. Since we established the framework we
> have created a lot of deployment plans with different number of
> component instances and connections.
>
> But recently we were faced with CORBA MARSHAL exceptions with our
> largest deployment plans. The exception does occur when the deployment
> plan is started.
>
> We define a large deployment plan as:
>
> - approximately 150 component instances
> - approximately 350 connection instances
>
> Here I'm providing an error log snippet:
>
> (9300|6780) [LM_ERROR] - 13:43:25.043884 -
> Plan_Launcher::launch_plan - Deployment failed, exception: Caught
> StartError exception while invoking
> finishLaunch: PLANXXX, 1 errors from node applications:
> TestNode -
> finishLaunch raised CORBA exception : system exception, ID
> 'IDL:omg.org/CORBA/MARSHAL:1.0'
> Unknown vendor minor code id (0), minor code = 0, completed = NO
>
> Here is a summary of the results we collected so far:
>
> - Most important: The CORBA MARSHAL exception can be easily reproduced
> on weak
> PC hardware (Core 2 Duo target machine). On the host machine (Core
> i7) the
> exception does not occur at all.
> - By reducing the number of CCM connections (via deployment plan) we
> could also
> reduce the number of occurrences of the CORBA MARSHAL exception.
> - For release builds we can almost always reproduce the exception, for
> debug
> builds the exception occurs sporadically.
> - For debug builds sometimes we get the debug assertion
> "Invalid allocation size: 4294967295 bytes." in
> dance_node_manager.exe.
> In this scenario the CORBA MARSHAL exception will follow always.
> - In a debug session we were able to locate the area where the
> exception was
> thrown: The exception occurred when the plan_launcher called
> finishLaunch()
> on the execution_manager. The execution_manager replies with
> "SYSTEM_EXCEPTION:UNKNOWN_OBJECT" to this call.
>
> We would appreciate any help to solve the CORBA MARSHAL exception.
>
> At a first glance this looks like a race condition and we cannot
> completely exclude that it is caused by our own code. So we would also
> be grateful for any hints on how we could better narrow down the
> problem.
>
> If needed, we could also supply log files, call-stacks etc. But we
> would have to check that because log files may contain
> company-relevant data.
>
> Thank you.
>
> REPEAT BY:
> Start a large deployment plan on weak PC hardware, see the section
> "DESCRIPTION"
> of the PRF for an explanation of "large deployment".
>
> SAMPLE FIX/WORKAROUND:
> We have a small executable which is a wrapper for spawning the DAnCE
> runtime processes. We have implemented a retry strategy in the
> wrapper. If launching of a plan failed we are trying to re-launch the
> deployment plan for a configurable number of attempts. Currently this
> solves the problem for our production systems.
>
>
> _______________________________________________
> ciao-users mailing list
> ciao-users at list.isis.vanderbilt.edu
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flist.i
> sis.vanderbilt.edu%2Fcgi-bin%2Fmailman%2Flistinfo%2Fciao-users&data=02
> %7C01%7Cwotte%40dre.vanderbilt.edu%7Ca867d342faa64542c9bd08d58f2d505a%
> 7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636572347233365834&sdata=
> CX8V3n7VSR2ci4HwVThpUEBqlGCqjBpzR%2BfY1PLPAe4%3D&reserved=0
_______________________________________________
ciao-users mailing list
ciao-users at list.isis.vanderbilt.edu
http://list.isis.vanderbilt.edu/cgi-bin/mailman/listinfo/ciao-users
More information about the ciao-users
mailing list