[cadynce] How Many Configurations Is Enough?

Cross, Joseph Joseph.Cross at darpa.mil
Wed May 23 14:42:44 CDT 2007


Gerry -

This is great information. Thanks for posting it.
 
> There was a recent review of the Resource Manager Ensemble Software 
> Design Description for release 4 that covered how resources are
> allocated upon service deployment. 

This looks like true dynamic resource management. I.e., it will allocate
SCIs to nodes in configurations that have never been tested. I was under
the impression that the cert folks would not tolerate such a thing. Has
this changed?

> Step 1: Get the list of potential hardware on which to deploy 
> the software...

>         Step 2: Filter out the nodes that do not have enough resources
> (RAM, CPU utilization and bandwidth) to support the software.

Here "enough" means "enough resources remaining after already allocated
higher priority processes and dataflows have taken their slices," right?

> Step 3: Determine where to deploy the software.  This is based on:
>          Co-location dependency - must be deployed on same 
> node as other software

>          Replication scheme - Master/slave - deploy master 
> replicas on same nodes as dependent software masters

No compendo. Please elucidate. E.g., what's a dependent software master?

>          Survivability with respect to fire zones...

What's a fire zone? How does it relate to the data centers, EMEs, etc.
that we've heard about? 

> ... if a replica already
>          exists, attempt to deploy a new replica in a 
> different firezone.
>          Survivability with respect to nodes - if a replica exists 
> on one node, attempt to deploy a new replica on a different node. 

So if there's no other choice, you will deploy a replica on the same
node as its master?

> The resource manager will attempt to evenly distribute 
> replicas across different fire zones.
>          Bin Packing - if the above does not result in a list of zero
>          candidate nodes, then use bin packing to select the 
> node.  The
>          resource manager will attempt to deploy the software 
> in the node
>          with the most RAM first, then CPU utilization then network
>          bandwidth that meets the software needs.  The 
> attempt is to deploy
>          1 copy of all software needed for an operational 
> string before
>          deploying additional replicas.
> 
> Step 4:  If there is not enough room on any node to deploy 
> the software,
> then the resource manager will examine the existing software 
> on candidate
> nodes for an operational string(s) that can be preempted.  
> This is based on
> criticality, resource utilization, effect of previous preemptions and
> replicas.

Very interesting. Do you distinguish between strings that would have to
be killed until more hardware is available and those that could be
immediately restarted?

Thanks again for posting this, and thanks in advance for your time and
effort in educating us.

- Joe


More information about the Cadynce mailing list