Research Interests

Empirical Software Engineering          Mining Software Repositories

Software Product Lines Code Recommender Systems

Variability Management Build Systems

Reverse Engineering (Secure) API Usage
Research Projects

API-misuse Detection (MSR '16)

When developers use Application Programming Interfaces (APIs), they often make mistakes that can lead to bugs, system crashes, or security vulnerabilities. We refer to such mistakes as misuses. One example of a misuse is forgetting to call close() after opening a FileInputStream and writing to it. There are various categories of API-misuses, and most of the current misuse detectors only find some of these categories. Our goal is to systematically design a misuse-detector that can cover most of these categories. As a first step, we created MUBench, a benchmark of existing API-misuses against which we can evaluate several misuse-detectors. To find more about our collected data set and our automated pipeline for running detectors and reviewing their results, please checkout

Understanding Why Application Developers Struggle With Cryptography APIs (ICSE '16)

Previous research has shown that many security vulnerabilities exist due to developer's misuse of cryptography APIs. In other words, developers make mistakes while using the APIs and these mistakes can lead to serious security threats. In this project, we wanted to investigate the reasons for such mistakes and suggest reasons on how to improve the situation. Through analyzing StackOverflow posts, GitHub repositories, and conducting two surveys of a total of 48 application developers, we collect the problems developers face with the current cryptography APIs and their suggestions for improvement. To find out more about this study, please check out our ICSE '16 paper and our artifact page.

Software Product Line Migration

In this project, our goal is to collect and compare experiences of companies that have successfully migrated to an SPL or that are currently in the migration process. This will be done through interviews with architects and engineers from various companies. Our focus is on technical details of the migration, such as the identification of variability in existing products, including details on the diff-ing strategies of source code; the modeling of variability and identification of features; and what kind of refactoring is needed to migrate products to an integrated platform. Examples of other details we strive to analyze comprise version-control strategies and product-generation techniques. More 

Variability Modeling of Cryptographic Components (ONWARD! '15, VaMoS '16)

There is a wide variety of cryptographic components and algorithms (e.g., ciphers, digests, signatures, etc.). Each of these components comes with its own variability. For example, a cipher can be symmetric or asymmetric. If it is symmetric, it can operate on blocks or streams. Additionally, there are different modes of operations (e.g., ECB vs CBC) as well as different padding schemes. In order to deal with this huge variability space, we model cryptographic components using concepts from feature modeling. However, since such components have many attributes and solutions using cryptography may use multiple components at once, we need additional modeling notations than those offered by basic feature modeling. More

CPP Usage in Practice (ECOOP '15)

The C preprocessor has often been criticized regarding separation of concerns, error proneness, and code obfuscation, but is still widely used in practice. Many (mostly academic) alternatives to the preprocessor exist, but have not been adopted in practice. Since developers continue to use the preprocessor despite all criticism and research, we ask how practitioners perceive the C preprocessor through a series of interviews and surveys. More

Mining Configuration Constraints (ICSE '14, TSE '15)

One of the challenges in developing a software product line is creating a variability model. This is especially true if it is being created from previous, existing code which has to be analyzed in order to identify configuration constraints. In this project, we develop a framework that analyzes such code to identify configuration constraints that can be used to create a variability model. More

Identifying Causes and Fixes of Linux Variability Anomalies 
(MSR '13)

In order to prevent variability anomalies from occurring in the first place, we need to understand what causes them. In order to provide automated solutions for such anomalies, we need to understand how developers usually fix them. This project mines commit information from Linux's git repository in order to identify causes and fixes of variability anomalies. Our results show that variability anomalies are often introduced through incomplete patches that change Kconfig definitions without properly propagating these changes to the rest of the system. Anomalies are then commonly fixed through changes to the code rather than to Kconfig filesMore

Analyzing Linux Kbuild to Detect Variability Anomalies (WCRE '11, CSMR '12, JSEP '14)

Although build systems control what code gets compiled into the final built product, they are often overlooked when studying software variability. The Linux kernel is one of the biggest open source software systems supporting variability and contains over 10,000 configurable features described in its KCONFIG files. To understand the role of the build system in variability implementation, we use Linux as a case study. We study its build system, KBUILD, and extract the variability constraints in its Makefiles. We show that almost 50% of the configurable features in Linux control the compilation of code files in the build system. We use the extracted constraints to detect variability anomalies in the form of dead and undead code files and code blocks. More

Root Cause Analysis & Change Impact Analysis using CMDBs (CASCON '09, CSMR '10)

Many IT systems use Configuration Management Databases (CMDBs) to keep track of which hardware and software is installed as well as any problems that occur over time. Thus, over time, CMDBs collect large amounts of valuable data that can be used for decision support. This project proposes mining historic data from a CMDB to detect common co-changes that can be used to support root cause analysis and change impact analysis. More