Sensitive documents make collecting real training data hard, so we generate realistic synthetic data instead: template-based documents filled by Faker and LLMs. A BERT classifier trained purely on synthetic data reaches 88% page-wise precision on a real-world test set it never saw during training.
To automate the labor-intensive expansion of nutritional databases, we built a two-stage information retrieval system that matches food items by text and nutrient similarity. Adding SVM-based food category prediction (99% accuracy) boosted retrieval precision to 80%, and we outline an LLM-plus-optimization pipeline for simulating unmatched recipes.
Welcome to the official blog of the Applied Machine Learning Lab at the University of Bonn: our space for research highlights, course announcements, lab news, and tutorials.