[cadynce] Meditations on Learning and a Big Back End

Thu Mar 8 16:32:59 CST 2007

Gentledudes -

In today's telecon, Patrick said that the DARPA-hard part of the problem
is machine learning of indicators that with high confidence predict
certifiability. [Forgive me if I mangle your position, Patrick, and
please correct this discussion.] This idea scares me and I think it
scared Carol.

We may be splitting hairs here, but I think we want to discover _and
certify_ many configurations even without testing them all. E.g., we
old-school-certify one configuration and then we say that the
certificaiton-equivalent configuration that is obtained by swapping
components between two identical computers is also thereby certified.
Perhaps this is what Patrick calls a high confidence predictor of
certifiability? If not, then could we get an example of such a
predictor?

If the machine learning is used to generate, at system test time, a
probably certifiable configuration (whence might grow a big honkin'
certification-equivalence class of derived configurations), and we take
the one generated configuration and skoll-test the tail off it, then
great! That's what we should do. 

On the other hand, the idea of (taking the machine-learned indicator and
deploying it on the ship and in order to generate a new untested
configuration following battle damage and install that) is a very sporty
idea. I doubt that we could sell it to the Navy. With any luck, we won't
need to - the more conservative approach will provide all the benefit we
can measure (I hope).

=============

Also in today's telecon, Gautam opined that our tool chain has a big
back end, which I interpret to mean that there will be a lot of
sophisticated processing happening on the ship at the moment a new
configuration is needed. I'm not so sure.

The tool chain front end passes to the back end what is semantically a
big boolean expression that defines a set of configurations that are
certified. Let's call this datum the 'Set Of Certified Configurations'
(SOCC). The actual representation of the SOCC is TBD.

Consider a system that has components A, B, & C, and computers X, Y, &
Z. Let "XAC" denote "Components A and C run on computer X".

If the SOCC looks like

(XA & YB & ZC) |
(XB & YA & ZC) |
(XAB & ZC) |
(YAB & ZC)

then the back end doesn't have much to do - it just picks a one of the
parenthetical terms that fits on the available hardware (and maybe is
optimal according to other criteria like 'minimizes shutdowns and
restarts'.) Note that this SOCC amounts to just a list of acceptable
configurations. Note also that not every one of the listed
configurations needs to have been tested.

On the other hand, a more complex SOCC, like 

(!XC & !YC & (XA | YA) & (XB | YB))

requires more work at run time. In fact, in full generality, we have the
boolean satisfaction problem, which is NP hard.

So the bigness of our back end depends on how the SOCC is expressed.

Now every boolean expression can be put into disjunctive normal form, as
in the first example above. So the only reason to pass a more complex
SOCC would be if it was found to be beneficial to pass what would be an
unmanageably long list. And this could well be the case: consider 10
computers, 100 components, and the SOCC is trying to say "Any
configuration is fine as long as there are no more than 20 components on
any computer." That would be an unmanageably long list.

Bottom line: research is needed to find a representation for practically
useful SOCCs that requires minimum work at run time.

- Joe
____________________________________
Dr. Joseph K. Cross, Program Manager
DARPA/IXO, room 612
3701 North Fairfax Drive
Arlington, Virginia 22203-1714
joseph.cross at darpa.mil
(571) 218-4691