Research Interests

Relational Modeling Fundamentals of Induction Meta-Analysis of Techniques Hybrid Models

My research interests concentrate on foundations and applications of data mining, machine learning, and statistical model estimation. I will use the term model estimation to refer to approaches from any of the three disciplines (e.g. linear regression, inductive logic programming, neural networks). My scientific works fall into the following four interrelated categories.

Relational Learning

Data Mining and predictive modeling (both in machine learning and statistics) commonly assume the data to have a feature vector representation with a fixed number of independent variables. However, most business and application domains require a much more complex representation for its data: the multi-relational datamodel that includes multiple tables for differennt entities linked through keys. Figure 1 shows an example of such a datamodel for a complex retail domain. The first field that addressed the task of learning classification models from such complex representation was Inductive Logic Programming (ILP) combining the expressive power of First Order Logic and induction methods for model estimation. More recently the machine learning community proposed upgrades of existing propositional (feature vector based) models drawing from ILP and statistical methods. The most common upgrade approach is the transform the multi-relational domain into a feauture vector representation. This process involves two steps, the identification of related entities (based on idetifiers and keys) and the aggregation of sets of related entities into atomic feature-values. The focus of my work has been on the aggregation of entities with high-dimensional categorical attributes from 1-n relationships. My theoretical work (see for instance
KDD 2003) includes a taxonomy of relational concepts that can be expressed given a set of assumptions for aggregation. For my dissertation I have developed a set of aggregation operators that are based on class-conditional density estimates and vector distances that miniumize the degree of information loss during aggregation and preserve task-specific predictive information. This methodology has outperformed alternative approaches (including ILP and existing upgrades) on a number of domains.

Hybrid Models

Every model class has its particular strength and limitations. A promissing approach to overcome some of the limitation while maintaining the particular strenght of models is to combine different modeling approaches (e.g., structural model and machine learning techniques) in hybrid architectures. For my
Diploma Thesis I investigated different architectures that combine of physical models for dynamic vehicle behaviour and neural networks.

Fundamentals of Induction

Similarity:

The justification of predictive modeling rests on the assumption that induction is possible, and in particular that objects that are similar will exhibit a similar target value. The essence of learning is the identification of properties that make objects similar with respect to a specific target. One important observation is that the notion of similarity depends on the task. In my dissertation, I have focused and on similarities of complex objects in relational databases. The main challenge in a relational domain is to assess similarity with respect to links between objects and sets of related objects. The first step of my work was to develop a relational notion of similarity that is general enough to support a general-purpose system architecture, in order to learn a variety of similarities for different relational prediction tasks, and that is at the same time precise enough to capture the particular semantics of the task (for maximal prediction performance). Beyond the scope of my dissertation, I have been interested in graph and time-series similarity measures.

Performance Measures:

The judgment of data mining research is always dependent on the rigor of the evaluation. Besides the rigor of the methodology, the choice of an appropriate performance measure is crucial. The question of performance measures for probability estimation is of particular interest due of the mismatch between the target observations (e.g., binary) and the predictions (continuous [0,1]). Performance measures are relevant not only in the evaluation but also during model estimation to guide the parameter optimization or structural search. It is possible to derive appropriate cost functions using maximum likelihood from assumptions about the noise. However, little guidance exists if those assumptions are consistently violated. This emphasizes the need for better understanding and better assessment of the true empirical noise. From an application standpoint, the performance measure should be as close to the true criteria of the model use as possible. It is however unclear whether this should also be the measure for model estimation. Methodology evaluations in a research setting pose additional requirements. In particular, invariance towards specific domain properties such as class prior and scaling allows for a higher degree of abstraction and generalization of findings across domains. I have recently developed a framework that categorizes a variety of performance measures with respect to their statistical properties and assumptions. Most measures focus on the model's ability to decrease variance, whereas other focus on the relative precision of the conditional expected value of the target. I am currently interested in the preferential selection of models using different performance measures controlling for different types of noise.

Noise and Uncertainty:

The evaluation of model estimation approaches is typically relative to some lower-bound baseline. However, our ability to model a phenomenon has also an upper bound: the level of inherent uncertainty. For a machine learning perspective, the point of interest is how well the perfect model could possibly perform on a given problem. There are a number of theoretical concepts (i.e. Bayes rate), but it is very difficult to distinguish empirically between a prediction error due to inherent noise or due to a shortcoming of the model. The reason for this difficulty is that the identification of noise requires assumptions either about statistical characteristics of the noise or about the true underlying relationship. I am currently investigating theoretically and empirically how to measure the "true noise" (or degree of inherent uncertainty) for different types of noise. I have reasons to believe that the particular characteristics of the "noise model" interact strongly with model quality across different modeling methodologies. Those interactions are very valuable for the design of automated machine learning methods.

Meta-Analysis

It is a challenging task to predicting under what circumstances one method outperforms another on a real domain and it has become even more pressing as continuously new methods are proposed. Not only can this knowledge increase our understanding, by highlighting strengths and weaknesses suggesting further improvements but it also opens the door for practitioners to choose among the many different model estimation approaches. One important requirement for meta-learning is the ability to characterize a task and domain with respect to the driving factors of relative model class performance. Two topics in the category of meta-analysis of learning that I have been particularly interested in are the interactions of data representation and model biases and the interactions of task difficulty (inherent uncertainty), training set size, and the model specific bias-variance tradeoff. In
prior work, we have found that linear models outperform decision trees if there is a high level of inherent uncertainty. Small training set sizes further enhances this effect. The interaction of model bias and data representation is of significant importance for the task of feature construction. A large part of my dissertation focuses on this issue for the task of feature construction from relational data. Note that those interactions are often driven by the implicit definition of similarity in a particular model class.