Type Inference in Flexible Model-Driven Engineering Using Classification Algorithms

About

This website includes all the necessary information needed to reproduce the experiments presented in the paper "Type Inference in Flexible Model-Driven Engineering Using Classification Algorithms" submitted for SoSyM STAF '15 Special Issue. You can find step-by-step instructions on how to run the experiments in the Instructions section. All the source code needed can be downloaded from the Downlaods section. In section Data all the raw data can be downloaded while the section Charts includes all the charts generated by these data.

Abstract: Flexible or bottom-up Model-driven Engineering (MDE) is an emerging approach to domain and systems modelling. Domain experts, who have detailed domain knowledge, typically lack the technical expertise to transfer this knowledge using traditional MDE tools. Flexible MDE approaches tackle this challenge by promoting the use of simple drawing tools to increase the involvement of domain experts in the language definition process. In such approaches, no metamodel is created upfront, but instead the process starts with the definition of example models that will be used to infer the metamodel. Pre-defined metamodels created by MDE experts may miss important concepts of the domain and thus restrict their expressiveness. However, the lack of a metamodel, that encodes the semantics of conforming models has some drawbacks, among others that of having models with elements the are unintentionally left untyped. In this paper we propose the use of classification algorithms to help with the inference of such untyped elements. We evaluate the proposed approach in a number of random generated example models from various domains. The correct type prediction varies from 23% to 100% depending on the domain, the proportion of elements that were left untyped and the prediction algorithm used.

Instructions

The following image presents the experimentation approach overview as discussed in the paper. For each of the steps of the process, detailed instructions are provided. We suggest the readers to start from step 3 that of the generation of the feature signatures from the generated muddles. Step 1 & 2 are related with the generation of the random models and the transformation to muddles which are outside the scope of the paper. Step 0 is required in any case.

Fig. 1: An overview of the experimentation process.

Data & Results

The metamodels, the muddles (.graphml) used in the experiment, the corresponding feature signatures (.txt) and all the raw results (.xlsx) can be downloaded from this section.

Tables & Charts

CART Experiment


The following are the tables for the CART experiment for the Normal (left) and Sparse (right) sets
Accuracy for N-CART.
Accuracy for S-CART.


The following are the line charts for the accuracy in each metamodel for the CART experiment using the Normal set.
Average accuracy for different sampling rates in Ant Metamodel
Average accuracy for different sampling rates in Bibtex Metamodel
Average accuracy for different sampling rates in Bugzilla Metamodel
Average accuracy for different sampling rates in Chess Metamodel
Average accuracy for different sampling rates in Cobol Metamodel
Average accuracy for different sampling rates in Conference Metamodel
Average accuracy for different sampling rates in Profesor Metamodel
Average accuracy for different sampling rates in Usecase Metamodel
Average accuracy for different sampling rates in Wordpress Metamodel
Average accuracy for different sampling rates in Zoo Metamodel


The following are the line charts for the accuracy in each metamodel for the CART experiment using the Sparse set.
Average accuracy for different sampling rates in Ant Metamodel
Average accuracy for different sampling rates in Bibtex Metamodel
Average accuracy for different sampling rates in Bugzilla Metamodel
Average accuracy for different sampling rates in Chess Metamodel
Average accuracy for different sampling rates in Cobol Metamodel
Average accuracy for different sampling rates in Conference Metamodel
Average accuracy for different sampling rates in Profesor Metamodel
Average accuracy for different sampling rates in Usecase Metamodel
Average accuracy for different sampling rates in Wordpress Metamodel
Average accuracy for different sampling rates in Zoo Metamodel


The following are the tables for the variables importance for the CART experiment for both the Normal (left) and Sparse (right) sets for each metamodel separately.
Variables importance for the N-CART experiment.
Variables importance for the S-CART experiment.


The following are the pie charts for the variables importance for the CART experiment for both the Normal (left) and Sparse (right) sets.
Variables importance for the N-CART experiment.
Variables importance for the S-CART experiment.

Random Forests


The following are the full table for all the tree values (1, 5, 10, 50, 250, 500, 1000) of running the Random Forests experiment for both the Normal (left) and Sparse (right) sets.
Accuracy for all the tree values in N-RF.
Accuracy for all the tree values in S-RF.



The following are the line charts for the accuracy in each metamodel for the Random Forests experiment using the Normal set.
Average accuracy for different sampling rates and trees in Ant Metamodel
Average accuracy for different sampling rates and trees in Bibtex Metamodel
Average accuracy for different sampling rates and trees in Bugzilla Metamodel
Average accuracy for different sampling rates and trees in Chess Metamodel
Average accuracy for different sampling rates and trees in Cobol Metamodel
Average accuracy for different sampling rates and trees in Conference Metamodel
Average accuracy for different sampling rates and trees in Profesor Metamodel
Average accuracy for different sampling rates and trees in Usecase Metamodel
Average accuracy for different sampling rates and trees in Wordpress Metamodel
Average accuracy for different sampling rates and trees in Zoo Metamodel


The following are the line charts for the accuracy in each metamodel for the Random Forests experiment using the Sparse set.
Average accuracy for different sampling rates and trees in Ant Metamodel
Average accuracy for different sampling rates and trees in Bibtex Metamodel
Average accuracy for different sampling rates and trees in Bugzilla Metamodel
Average accuracy for different sampling rates and trees in Chess Metamodel
Average accuracy for different sampling rates and trees in Cobol Metamodel
Average accuracy for different sampling rates and trees in Conference Metamodel
Average accuracy for different sampling rates and trees in Profesor Metamodel
Average accuracy for different sampling rates and trees in Usecase Metamodel
Average accuracy for different sampling rates and trees in Wordpress Metamodel
Average accuracy for different sampling rates and trees in Zoo Metamodel


The following are the tables for the variables importance for the Random Forests experiment for both the Normal (left) and Sparse (right) sets for each metamodel separately.
Variables importance for the N-RF experiment.
Variables importance for the S-RF experiment.

The following are the pie charts for the variables importance for the Random Forests experiment for both the Normal (left) and Sparse (right) sets.
Variables importance for the N-RF experiment.
Variables importance for the S-RF experiment.

The following are the line charts for the accuracy for the 50% sampling for 10 different models of each metamodel.
Average accuracy for different models in Ant Metamodel
Average accuracy for different models in Bibtex Metamodel
Average accuracy for different models in Bugzilla Metamodel
Average accuracy for different models in Chess Metamodel
Average accuracy for different models in Cobol Metamodel
Average accuracy for different models in Conference Metamodel
Average accuracy for different models in Profesor Metamodel
Average accuracy for different models in Usecase Metamodel
Average accuracy for different models in Wordpress Metamodel
Average accuracy for different models in Zoo Metamodel