Navigationsweiche Anfang

Navigationsweiche Ende

Select language

Natural Language Processing

Our Natural Language Processing (NLP) research addresses basic research challenges and applied problems, partially in cooperation with industry partners.

Our main goal is to enable computers to understand and use natural language to support humans with numerous tasks, such as text classification, machine translation, question answering, sentiment analysis, text summarization, text generation, language modeling, dialog systems, and word sense disambiguation.

OriginStamp – Trusted Timestamping via Bitcoin

OriginStamp is a web-based, trusted timestamping service that uses the decentralized Bitcoin blockchain to store anonymous, tamper-proof timestamps for any digital content. OriginStamp allows users to hash files, emails, or plain text, and subsequently, store the created hashes in the Bitcoin blockchain as well as retrieve and verify timestamps that have been committed to the blockchain. OriginStamp is free of charge and easy to use and thus allows anyone, e.g., students, researchers, authors, journalists, or artists, to prove that they were the originator of certain information at a given point in time.



HyPlag: Hybrid Plagiarism Detection

HyPlag is a system that implements hybrid plagiarism detection (hybridPD) – a novel approach capable of detecting also heavily disguised plagiarism in academic texts. The hybridPD approach combines the analysis of non-textual content in academic documents, such as citations, images, and mathematical expressions, with traditional text similarity analysis. Existing plagiarism detection software only examines text similarity, and thus typically fails to detect disguised plagiarism forms, including paraphrases, translations, or idea plagiarism. hybridPD addresses this shortcoming by additionally analyzing non-textual content to form a language-independent semantic “fingerprint” of document similarity.

The hybridPD approach implemented in HyPlag integrates and continues several of our previous research projects, particularly on Citation-based Plagiarism Detection (CbPD)
and Mathematics-based Plagiarism Detection (MathPD).

Media Bias Analysis - slanted news coverage identification

The following group of projects seeks to (semi-)automatically identify slanted news coverage, i.e., media bias, in news articles. Current projects include news-please (an integrated web crawler and information extractor for news articles), NewsBird (a news aggregator that reveals different perspectives in international news topics), and Giveme5W1H (a system that extracts phrases answering the journalistic 5W1H questions).

MathIR: Mathematical Information Retrieval

As part of the DFG-funded research project GI 1259/1-1: Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation- and Assistance-Systems, we investigate fundamental methods and tools for making mathematical knowledge accessible to information retrieval tools.

Docear: Academic Literature Management via Mind Maps

Docear combines a mind-mapping tool with a recommender system for academic literature and a reference manager. The mind maps allow users to organize their ideas and to import the annotations they made while reading PDFs, e.g., comments, highlights or bookmarks. The software works with standard PDF annotations, thus can be used with different PDF viewers. 

Mr. DLib: Machine-readable Digital Library

Mr. DLib's "Recommendations as a Service" allows operators of academic products to easily integrate a scientific recommender system into their products. The basic idea of Mr. DLib's scientific recommender system is to calculate recommendations for research articles, call for papers, grants, etc. on Mr. DLib's server. Operators of academic products may then request recommendations from Mr. DLib and display the recommendations to their users. 

Co-Citation Proximity Analysis: Recommendation and Clustering Algorithms for Academic Literature

Co-Citation Proximity Analysis (CPA) is a method to compute both local and global instances of semantic similarity in academic documents by examining citation proximity in the full texts of documents. CPA was developed with two applications in mind: recommender systems and clustering. Regarding the first application, an improved measure of document semantic similarity, which computes similarity at a more fine-grained resolution, has the potential to significantly improve the relevance of academic literature recommendations. 

CITREC: Open Evaluation Framework for Citation-based Similarity Measures

CITREC is an open evaluation framework for citation-based and text-based similarity measures. CITREC prepares the data of two formerly separate collections for a citation-based analysis and provides the tools necessary for performing evaluations of similarity measures.

zuletzt bearbeitet am: 14.01.2022