[cadynce] Meditations on Learning and a Big Back End

Fri Mar 9 08:29:43 CST 2007

Hi Joe,

Cross, Joseph wrote:
> Gentledudes -
>
> In today's telecon, Patrick said that the DARPA-hard part of the problem
> is machine learning of indicators that with high confidence predict
> certifiability. [Forgive me if I mangle your position, Patrick, and
> please correct this discussion.] This idea scares me and I think it
> scared Carol.
>
> We may be splitting hairs here, but I think we want to discover _and
> certify_ many configurations even without testing them all. E.g., we
> old-school-certify one configuration and then we say that the
> certificaiton-equivalent configuration that is obtained by swapping
> components between two identical computers is also thereby certified.
> Perhaps this is what Patrick calls a high confidence predictor of
> certifiability? If not, then could we get an example of such a
> predictor?
>   
I believe we agree but use different terminology.

I expect that standard statistical analysis and machine learning 
techniques will be required to extract characteristics that positively 
correlate with certified configurations. Those characteristics become 
constraints that an on-line algorithm have to satisfy to generate a 
novel configuration that we accept as equivalent to a certified one. In 
my mind your notion of equivalence classes are defined by these 
characteristics. I do see the untested equivalent configurations as 
highly probable but not mathematically proved to work.
> If the machine learning is used to generate, at system test time, a
> probably certifiable configuration (whence might grow a big honkin'
> certification-equivalence class of derived configurations), and we take
> the one generated configuration and skoll-test the tail off it, then
> great! That's what we should do.
I see the tool chain as generating a set of certified configuration that 
go through the complete process including testing by skoll. I see those 
configurations, the ones that failed to be certified, and the full 
evidence data base for them as that data that statistical methods or 
machine learning methods will use to learn the characteristics I 
describe above.
>  
>
> On the other hand, the idea of (taking the machine-learned indicator and
> deploying it on the ship and in order to generate a new untested
> configuration following battle damage and install that) is a very sporty
> idea. I doubt that we could sell it to the Navy. With any luck, we won't
> need to - the more conservative approach will provide all the benefit we
> can measure (I hope).
>
>   
This is what I am suggesting. I don't understand how defining 
equivalence classes get around this problem.  We are talking about how 
to deploy a configuration that have never been tested in the lab. I 
conjecture that there is no way to do that with anything other than a 
statistical assurance. The goal is to first give them a 1000 tested 
configurations and use the novel generation method as a back up when 
none of them fit.

Patrick

> =============
>
> Also in today's telecon, Gautam opined that our tool chain has a big
> back end, which I interpret to mean that there will be a lot of
> sophisticated processing happening on the ship at the moment a new
> configuration is needed. I'm not so sure.
>
> The tool chain front end passes to the back end what is semantically a
> big boolean expression that defines a set of configurations that are
> certified. Let's call this datum the 'Set Of Certified Configurations'
> (SOCC). The actual representation of the SOCC is TBD.
>
> Consider a system that has components A, B, & C, and computers X, Y, &
> Z. Let "XAC" denote "Components A and C run on computer X".
>
> If the SOCC looks like
>
> (XA & YB & ZC) |
> (XB & YA & ZC) |
> (XAB & ZC) |
> (YAB & ZC)
>
> then the back end doesn't have much to do - it just picks a one of the
> parenthetical terms that fits on the available hardware (and maybe is
> optimal according to other criteria like 'minimizes shutdowns and
> restarts'.) Note that this SOCC amounts to just a list of acceptable
> configurations. Note also that not every one of the listed
> configurations needs to have been tested.
>
> On the other hand, a more complex SOCC, like 
>
> (!XC & !YC & (XA | YA) & (XB | YB))
>
> requires more work at run time. In fact, in full generality, we have the
> boolean satisfaction problem, which is NP hard.
>
> So the bigness of our back end depends on how the SOCC is expressed.
>
> Now every boolean expression can be put into disjunctive normal form, as
> in the first example above. So the only reason to pass a more complex
> SOCC would be if it was found to be beneficial to pass what would be an
> unmanageably long list. And this could well be the case: consider 10
> computers, 100 components, and the SOCC is trying to say "Any
> configuration is fine as long as there are no more than 20 components on
> any computer." That would be an unmanageably long list.
>
> Bottom line: research is needed to find a representation for practically
> useful SOCCs that requires minimum work at run time.
>
>
> - Joe
> ____________________________________
> Dr. Joseph K. Cross, Program Manager
> DARPA/IXO, room 612
> 3701 North Fairfax Drive
> Arlington, Virginia 22203-1714
> joseph.cross at darpa.mil
> (571) 218-4691
>  
> _______________________________________________
> Cadynce mailing list
> Cadynce at list.isis.vanderbilt.edu
> http://list.isis.vanderbilt.edu/mailman/listinfo/cadynce
>