Ersatzmodels

There is the story of this colleague of mine, who is working in a big funded research project. Such projects bring together researchers and industrial partners, who provide use cases for research prototypes. Indeed my colleague had no problems to get access to a row of models that were used at the industrial partners’ sides. He used them successfully to evaluate his prototype and the cooperation was documented as official part of the overall project. However, all models remained confidential matter.

This is where the problem of my colleague began.

When publishing an evaluation or experiment it should be comprehensible and repeatable. However, this is not possible, when the used models cannot be shown or described. But: how to publish about his approach and prototype when the evaluation cannot be published, too? Was he going to produce yet another one of these papers that include the vague sentence: “We evaluated our prototype on the models of a big <<domain>> company from <<region>> and found that it can successfully be applied.”?

Most of us know this situation. While investing a lot of effort in industry cooperation to gain access to real problems, real use cases, and real models, researchers often end up not being able to publish evidence on their results – a bitter experience.

A class of pragmatic solutions: Ersatzmodels

During the discussion of the FMI’14 workshop in Vienna, it became clear that several researchers experiment with pragmatic solutions towards this problem. Although quite different in methodology and result, these solutions have a common target: the creation of Ersatzmodels, i.e. realistic models that can be used as substitute for real industrial models. During the workshop, we initially identified four types of Ersatzmodels, which are in use by researchers and differ in their costs and distance to real models.

    • Generated Models: Models are automatically generated in order to gain different models for the evaluation of e.g. modeling tools. To make these models realistic, model generators might be build using individual real model.
    • Pseudo Reference Models: When having exclusive access to a reasonable number of real but confidential models from one domain, scientists may manually create reference models, which shall reflect typical, reoccurring properties of the original models.
    • Obfuscated Models: Single original models are systematically manipulated in order to reach confidentiality, while maintaining properties of interest. A typical goal here is to prevent that the company that provided the original models can be identified.
    • Real Ersatzmodels: These models are created by researchers in close cooperation with a company, e.g. by simulating a project.

Hopes, Concerns, & To-Dos

However, most of these approaches are still in an experimental state. They are not widely accepted as standard research methods. To reach that the creation and usage of Ersatzmodels is accepted by research communities and reviewers, several so far open aspects need to be addressed.

Methods for creating the different forms of Ersatzmodels must move from “ideas with ad-hoc realization” towards well-documented standards. This is not only necessary to ensure comprehensibility, but also a precondition to discuss, design, and ensure quality of the resulting models and their applicability for further studies and experiments. Further, like for established research methods, it should be documented for what cases different Ersatzmodels can be used and for what cases not, i.e. what are typical threats to validity.

— Thus, Ersatzmodels are a promising, but still immature approach towards enabling research, while being exposed to pressure of both, quality needs, such as comprehensibility, repeatability, and reliability, on the one side and companies’ privacy needs on the other side.

Do Model Repositories Need Validity Disclaimers?

Model repositories, as we collect them in the FMI Model Index, build an infrastructure for researchers to gain models for investigations and for evaluation of new model analysis or transformation techniques.

However, using model repositories substitutes the classical task of data collection and, thus, is a change in the research approach. This implies a big challenge for the research’s validity, especially the external and construction validity (as described by Wohlin et al. Experimentation in software engineering. Springer, 2012), which can be affected on two levels:

  1. There is the correctness and the context of the captured and published models. Are these models correct and sufficiently complete to provide an appropriate picture of the reality? For example, is the published model from an early system design that was dismissed later on? Or was the model used for generation of productive code? Answers to these questions can only be provided by the persons who add the models to an repository.
  2. Further, there is the question whether/how far the mix of models within a repository is representative and insights made on the basis of these models can be generalized. Selection biases within repositories seem to be not the exception but the rule, e.g. due to a specialization of the repository for certain domains, or due to the fact that most models are provided by a small group of researchers, only.

Surely the actual validity discussion always depends on the concrete research, studies or evaluations. Nonetheless, the information required for this discussion can only be provided by the publishers of the models or by operators of the model repositories. Moreover, for each repository it is clear that certain kinds of research cannot be done with the data. All these information might be provided by a repository in form of a “validity disclaimer”.

However, model repositories often do not include much metadata on the models. The better ones will provide information on the data source, e.g. the company the models stem from.

How can this shortcoming be resolved? Can operators of model repositories provide validity disclaimers? If yes how? Should they demand a basic set of information on validity threats from people who add models to these repositories? What happens when model repositories are filled by web crawlers instead of humans? To what extend can research results be reliable when a repository is used that does not provide some kind of a validity disclaimer?

Model Review Criteria

Reviewing models, be it in academic exercises, of full-blown industrial designs and architectures, or as part of scientific research, is quite hard and it is non-obvious what good criteria for such an evaluation could be. Here is a story of a “model review”, told by Richard P. Feynman, in which he is considering a blueprint of a chemical plant (“Surely You’re Joking, Mr. Feynman!”, Vintage, 1992, pp. 123sqq.):

How do you look at a plant that isn’t built yet? I don’t know. Lieutenant Zumwalt […] takes me into this room where there are these two engineers and a loooooong table covered with a stack of blueprints representing the various floors of the proposed plant.

I took mechanical drawing when I was in school, but I am not good at reading blueprints. So they unroll the stack of blueprints and start to explain it to me, thinking I am a genius. Now, one of the things they had to avoid in the plant was accumulation. They had problems like when there’s an evaporator working, which is trying to accumulate the stuff, if the valve gets stuck or something like that and too much stuff accumulates, it’ll explode. So they explained to me that this plant is designed so that if any one valve gets stuck nothing will happen. It needs at least two valves everywhere.

Then they explain how it works. The carbon tetrachloride comes in here, the uranium nitrate from here comes in here, it goes up and down, it goes up through the floor, comes up through the pipes, coming up from the second floor, bluuuuurp—going through the stack of blueprints, down-up-down-up, talking very fast, explaining the very very complicated chemical plant.

I’m completely dazed. Worse, I don’t know what the symbols on the blueprint mean! There is some kind of a thing that at first I think is a window. It’s a square with a little cross in the middle, all over the damn place. I think it’s a window, but
no, it can’t be a window, because it isn’t always at the edge. I want to ask them what it is. […] I get an idea. Maybe it’s a valve. I take my finger and I put it down on one of the mysterious little crosses in the middle of one of the blueprints on page three, and I say “What happens if this valve gets stuck?” —figuring they’re going to say, “That’s not a valve, sir, that’s a window.”

So one looks at the other and says, “Well, if that valve gets stuck—” and he goes up and down on the blueprint, up and down, the other guy goes up and down, back and forth, back and forth, and they both look at each other. They turn around to me and they open their mouths like astonished fish and say “You’re absolutely right, sir.”

Looking at the story from a more general model review perspective, the fundamental model properties of mapping, reduction, and pragmatism coined by Herbert Stachowiak („Allgemeine Modelltheorie“, Springer, 1973) may be a first start. The blueprints seem to have worked rather well in this respect: The mapping in the blueprints made it possible to review a plant that had not been built yet, though, in fact, the mapping was not clear to all reviewers (and it is not told whether the plant could have been constructed successfully from these blueprints). The blueprints were pragmatically successful, i.e., the model was apt for the purpose of analysing whether only one valve getting stuck could lead to accumulation. However, heavy explanation was necessary, and still some mistakes could go undetected. It may be speculated that the reduction for the ultimate goal of fault analysis could have been more elaborate.

Are there other criteria that should be considered when reviewing a model? Maybe originality and novelty would be of some importance, in particular judging whether the right reduction has been chosen for some intended pragmatic purpose. But would for example also aesthetic criteria count? Or the (mis-)use of some arcane modelling language feature?