PhD Student, Tabular Data
Lennart Purucker
Postal address
Institut für InformatikAlbert-Ludwigs-Universität Freiburg
Sekretariat Hutter/Maschinelles Lernen
Georges-Köhler-Allee 074
79110 Freiburg, Germany
Office
Building 074, Room 00-012About
I am a Ph.D. student at the University of Freiburg, Germany, supervised by Frank Hutter. My Ph.D. position is part of the Small Data Initiative (CRC 1597, Project C05). My research interest is in the field of artificial intelligence, with a focus on automated machine learning, ensemble learning, deep learning, and meta-learning (for small data). My primary focus is on tabular data (e.g., Excel sheets), but I also work on vision, text, and time series data.
I completed my bachelor’s degree in applied computer science in 2019 at the DHBW Stuttgart and my master’s degree in computer science in 2021 at the RWTH Aachen. From November 2021 to August 2023, I worked as a research assistant at the University of Siegen on ensemble learning for automated machine learning (AutoML) and recommender systems. From August 2023 to November 2023, I was an applied scientist intern at AWS as part of the AutoGluon team.
In 2024, I mainly worked on TabPFN, a foundation model for (small) tabular data. Furthermore, I participated in Kaggle's AutoML Grand Prix, as leader of the “AutoML Grandmasters” team, where we scored a very close (1 point) second place with AutoGluon and TabPFN to win $20,000.
Other
Community Involvement:
- Reproducibility Chair at the AutoML Conference 2023, 2024, and 2025.
- Co-organizer of the AutoML Seminar
- Developer of AutoGluon (Tabular)
- Member of the OpenML Team (Python API)
Reviewing:
- 2024: AutoML, NeurIPS DBT, CVPR Workshop on Foundation Models for Medical Vision, ICML ICL Workshop
- 2023: Reproducibility Reviewer at the AutoML Conference
Teaching:
Publications
2024 |
Ensembling Finetuned Language Models for Text Classification Inproceedings In: NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability, 2024. |
Large Language Models Engineer Too Many Simple Features for Tabular Data Inproceedings In: NeurIPS 2024 Third Table Representation Learning Workshop, 2024, (Oral Presentation). |
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models Inproceedings In: 38th Conference on Neural Information Processing Systems (NeurIPS), DBT Track, 2024. |
Transfer Learning for Finetuning Large Language Models Inproceedings In: NeurIPS 2024 Workshop on Adaptive Foundation Models, 2024. |
Dynamic Post-Hoc Neural Ensemblers Inproceedings In: Preprint, 2024. |
Quick-Tune-Tool: A Practical Tool and its User Guide for Automatically Finetuning Pretrained Models Inproceedings In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Workshop Track, 2024. |
Hardware Aware Ensemble Selection for Balancing Predictive Accuracy and Cost Inproceedings In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Workshop Track, 2024. |
Don’t Waste Your Time: Early Stopping Cross-Validation Inproceedings In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Methods Track, 2024. |
Revealing the Hidden Impact of Top-N Metrics on Optimization in Recommender Systems Inproceedings In: European Conference on Information Retrieval, pp. 140–156, Springer 2024. |
AMLTK: A Modular AutoML Toolkit in Python Journal Article In: Journal of Open Source Software, vol. 9, no. 100, pp. 6367, 2024. |
2023 |
The Effect of Random Seeds for Data Splitting on Recommendation Accuracy Conference Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2023), co-located with the 17th ACM Conference on Recommender Systems, 2023. |
CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure Conference AutoML Conference 2023, 2023. |
AutoML Conference 2023, 2023. |
2022 |
Estimating the Pruned Search Space Size of Subgroup Discovery Inproceedings In: 2022 IEEE International Conference on Data Mining (ICDM), 2022. |
Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML Conference First Conference on Automated Machine Learning (Late-Breaking Workshop), 2022. |