Goethe-Universität — OpenScience@IfI

OpenData, OpenMethodology, OpenSource

GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.

Contact: Prof.Dr. Alexander Mehler

Gnucap-UF

Gnucap-UF Extension of the Analog Circuit Simulator Gnucap

Graph4Med

A web application and a graph database for visualizing and analyzing medical databases

HeidelTime_ext

Extension of HeidelTime

OpenMethodology, OpenSource

HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime's pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeext. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 % or 8.5 %, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeext, its evaluation on text samples from various genres, and share some linguistic observations.

Contact: Prof.Dr. Alexander Mehler

SemioGraph

TTLab-Embeddings

OpenData

The Text Technology Lab provides on this page a list of ready-made embeddings created by the Lab. The list contains a variety of downloadable embeddings of different methods and parameters. Metadata is available for all files, with information about the corpus, the method, the tool hyper parameters and much more. This allows a detailed search and easy recovery as well as the reuse of the embedding files. All data is also available through a RESTful API.

Contact: Prof.Dr. Alexander Mehler

OpenScience@IfI

ASDF-Dashboard

Automated subgroup

BigSense

A Word Sense Disambiguator for Big Data

CloudDBGuard

Umsetzung kryptographie-basierter Verfahren zur sicheren Datenverwaltung in Cloud-Datenbanken

FAC’14 Benchmark Suite

FAC'14 Benchmark Suite for Formal Verification of Analog Circuits

GerParCor

German Parliamentary Corpus

Gnucap-UF

Gnucap-UF Extension of the Analog Circuit Simulator Gnucap

Graph4Med

A web application and a graph database for visualizing and analyzing medical databases

HeidelTime_ext

Extension of HeidelTime

SemioGraph

SemioGraph

TTLab-Embeddings

TTLab-Embeddings

TextImager

TextImager

UIMADatabaseInterface

The UIMA Database Interface enables the generic use of UIMA documents for any database.

UIMATypeSystem

UIMATypeSystem

Utilities

TTLab Utilities

algo-learn

Lern- und Prüfungsportal für theoretische Grundvorlesungen der Informatik

fastSense

An Efficient Word Sense Disambiguation Classifier

OpenScience@IfI​

Automated subgroup

​ A Word Sense Disambiguator for Big Data

​Umsetzung kryptographie-basierter Verfahren zur sicheren Datenverwaltung in Cloud-Datenbanken

​FAC'14 Benchmark Suite for Formal Verification of Analog Circuits

German Parliamentary Corpus

​Gnucap-UF Extension of the Analog Circuit Simulator Gnucap

​A web application and a graph database for visualizing and analyzing medical databases

Extension of HeidelTime

​SemioGraph

​TTLab-Embeddings

​TextImager

​The UIMA Database Interface enables the generic use of UIMA documents for any database.

UIMATypeSystem

TTLab Utilities

Lern- und Prüfungsportal für theoretische Grundvorlesungen der Informatik

​An Efficient Word Sense Disambiguation Classifier

OpenScience@IfI

A Word Sense Disambiguator for Big Data

Umsetzung kryptographie-basierter Verfahren zur sicheren Datenverwaltung in Cloud-Datenbanken

FAC'14 Benchmark Suite for Formal Verification of Analog Circuits

Gnucap-UF Extension of the Analog Circuit Simulator Gnucap

A web application and a graph database for visualizing and analyzing medical databases

SemioGraph

TTLab-Embeddings

TextImager

The UIMA Database Interface enables the generic use of UIMA documents for any database.

An Efficient Word Sense Disambiguation Classifier