
OpenScience Initiative am IfI

Sammlung an OpenScience Projekten des Instituts für Informatik Frankfurt in den Kategorien:
OpenData, OpenAccess, OpenMethodology, OpenEducationalResources, CitizenScience, OpenSource.

OpenScience Initiative at IfI

Collection of OpenScience projects of the Institute for Computer Science Frankfurt in the categories:
OpenData, OpenAccess, OpenMethodology, OpenEducationalResources, CitizenScience, OpenSource.

Automated subgroup


This web application provides a service for an automated subgroup fairness analysis of a binary classifier. Our system detects the subgroups in the data automatically by using either subgroups obtained from clustering the dataset or entropy-based patterns derived from the found clusters.

Contact: Prof. Dr. Lena Wiese

​ A Word Sense Disambiguator for Big Data


A Word Sense Disambiguator for Big Data

Contact: Prof.Dr. Alexander Mehler

​Umsetzung kryptographie-basierter Verfahren zur sicheren Datenverwaltung in Cloud-Datenbanken

OpenAccess, OpenSource

Column family stores are a special case of NoSQL databases. To achieve data protection while at the same time supporting advanced data management in these stores, novel cryptographic algorithms like order-preserving and searchable encryption schemes are needed. In this project, several such schemes are implemented and tested with the stores Apache HBase and Apache Cassandra.

Contact: Prof.Dr. Lena Wiese

​FAC'14 Benchmark Suite for Formal Verification of Analog Circuits


This benchmark suite presented at the FAC'14 conference is a collection of analog circuits with testbenches and device models, that are interesting for formal circuit verification.

Contact: Prof.Dr. Lars Hedrich

German Parliamentary Corpus

OpenData, OpenMethodology, OpenSource

GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.

Contact: Prof.Dr. Alexander Mehler

​Gnucap-UF Extension of the Analog Circuit Simulator Gnucap


The Gnucap-UF extensions allow advanced methods for circuit analysis and verification. e.g. equivalence checking, state space analysis, ageing simulation.

Contact: Prof.Dr. Lars Hedrich

​A web application and a graph database for visualizing and analyzing medical databases

OpenAccess, OpenSource

Medical databases normally contain large amounts of data in a variety of forms. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort.

Contact: Prof.Dr. Lena Wiese

Extension of HeidelTime

OpenMethodology, OpenSource

HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime's pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeext. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 % or 8.5 %, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeext, its evaluation on text samples from various genres, and share some linguistic observations.

Contact: Prof.Dr. Alexander Mehler



SemioGraph aims to encode as much information as possible in one and the same graph representation. This is interesting for word networks, for example, where one needs to visualize units of information such as POS, node weight, node saliency, node centrality, etc. To present SemioGraph, we use word embedding networks.

Contact: Prof.Dr. Alexander Mehler



The Text Technology Lab provides on this page a list of ready-made embeddings created by the Lab. The list contains a variety of downloadable embeddings of different methods and parameters. Metadata is available for all files, with information about the corpus, the method, the tool hyper parameters and much more. This allows a detailed search and easy recovery as well as the reuse of the embedding files. All data is also available through a RESTful API.

Contact: Prof.Dr. Alexander Mehler


OpenData, OpenMethodology, OpenSource

TextImager, a UIMA-based framework that provides a set of NLP and visualization tools through a user-friendly GUI.

Contact: Prof.Dr. Alexander Mehler

​The UIMA Database Interface enables the generic use of UIMA documents for any database. 


The UIMA Database Interface enables the generic use of UIMA documents for any database.

Contact: Prof.Dr. Alexander Mehler


OpenData, OpenMethodology, OpenSource

The collection of all UIMA TypeSystemDescriptors for the pipelines UIMA pipelines of the Text Technology.

Prof.Dr. Alexander Mehler

TTLab Utilities


A collection of useful tools and classes for everyday use in the context of text technology.

Prof.Dr. Alexander Mehler

Lern- und Prüfungsportal für theoretische Grundvorlesungen der Informatik

OpenEducationalResources, OpenSource

Bei algo-learn handelt es sich um ein Lernportal, das sich derzeit in einer frühen Entwicklungs- und Testphase befindet. Ein Prototyp kann bereits betrachtet werden: Das Projekt wird unter einer Open Source Lizenz entwickelt, daher nehmen wir pull requests, feature requests, sowie bug reports gerne auf GitHub entgegen.

Contact: Prof. Dr. Holger Dell

​An Efficient Word Sense Disambiguation Classifier


An Efficient Word Sense Disambiguation Classifier

Contact: Prof.Dr. Alexander Mehler