Menu

Tabular Data

Research Topics & Interests

Foundation Models

We develop foundation models tailored to tabular data, building on our work introducing the original TabPFN family of models. Our research aims to create systems that learn strong inductive biases for structured data, enabling accurate predictions across diverse tasks with minimal task-specific tuning. While tabular datasets are often treated as purely static tables, real-world problems frequently involve richer context such as temporal structure or additional modalities. We therefore investigate how tabular foundation models can flexibly incorporate such information when present, while remaining efficient and reliable on standard tabular tasks.

Causal Understanding

We investigate how learning systems can account for the data-generating processes underlying tabular datasets, with the goal of improving reliability when conditions change. In many applications, models are deployed in environments that differ from the data on which they were trained, making purely predictive approaches brittle. Our work is motivated by the need for methods that support robust decision-making, meaningful interpretation, and responsible use in settings where interventions, feedback loops, or distribution shifts are expected.

Agentic Data Science

We explore autonomous systems that can perform end-to-end data science on tabular problems. This includes agents that iteratively analyze datasets, formulate hypotheses, select models, engineer features, run experiments, and refine solutions with minimal human intervention. Our goal is to build trustworthy agentic workflows that accelerate scientific discovery and practical data analysis while maintaining transparency and reproducibility.

Ongoing Projects

Large Language Models Engineer Too Many Simple Features for Tabular Data

Tabular machine learning problems often require time-consuming and labor-intensive feature engineering. Recent efforts have focused on using large language models (LLMs) to capitalize on their potential domain knowledge. At the same time, researchers have observed ethically concerning negative biases in other LLM-related use cases, such as text generation. These developments motivated us to investigate whether LLMs exhibit a bias that negatively impacts the performance of feature engineering. While not ethically concerning, such a bias could hinder practitioners from fully utilizing LLMs for automated data science. Therefore, we propose a method to detect potential biases by detecting anomalies in the frequency of operators (e.g., adding two features) suggested by LLMs when engineering new features. Our experiments evaluate the bias of four LLMs, two big frontier and two small open-source models, across 27 tabular datasets. Our results indicate that LLMs are biased toward simple operators, such as addition, and can fail to utilize more complex operators, such as grouping followed by aggregations. Furthermore, the bias can negatively impact the predictive performance when using LLM-generated features. Our results call for mitigating bias when using LLMs for feature engineering.

Student: Jaris Küken

Open Projects & Thesis Topics

Please refer to https://ml.informatik.uni-freiburg.de/student/ for open topics.

Members

PhD Students

Students

Breenda Das

Jaris Küken

Charlotte Lange

Mustafa Tajjar

Salih Bora Öztürk

Alumni

Lyubomir Ivanov

Martin Mráz

Publications

2025

Pfefferle, Alexander; Hog, Johannes; Purucker, Lennart; Hutter, Frank

nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN Workshop

2025.

Grinsztajn, Leo; Flöge, Klemens; Key, Oscar; Hayler, Adrian; Manium, Mihir; Garg, Anurag; Robertson, Jake; Hoo, Shi Bin; Birkel, Felix; Jund, Philipp; Jäger, Benjamin; Yu, Rosen Ting-Ying; Schölkopf, Bernhard; Hollmann, Noah; Hutter, Frank

TabPFN-2.5: a Preview Proceedings Article Forthcoming

In: EurIPS 2025 Workshop: AI for Tabular Data, Forthcoming.

Bühler, Magnus; Purucker, Lennart; Hutter, Frank

Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models Proceedings Article

In: EurIPS 2025 Workshop: AI for Tabular Data, 2025.

Bühler, Magnus; Purucker, Lennart; Hutter, Frank

Towards Synthetic Data for Fine-tuning Tabular Foundation Models Proceedings Article

In: Foundation Models for Structured Data workshop at ICML, 2025.

Arango, Sebastian Pineda; Janowski, Maciej; Purucker, Lennart; Zela, Arber; Hutter, Frank; Grabocka, Josif

Regularized Neural Ensemblers Proceedings Article

In: AutoML Conference 2025, 2025.

2024

Küken, Jaris; Purucker, Lennart; Hutter, Frank

Large Language Models Engineer Too Many Simple Features for Tabular Data Proceedings Article

In: NeurIPS 2024 Third Table Representation Learning Workshop, 2024, (Workshop Oral).

Hoo, Shi Bin; Müller, Samuel; Salinas, David; Hutter, Frank

The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features Proceedings Article

In: NeurIPS 2024 TRL Workshop, 2024.

Helli, Kai; Schnurr, David; Hollmann, Noah; Müller, Samuel; Hutter, Frank

Drift-Resilient TabPFN: In-Context Learning Distribution Shifts on Tabular Data Proceedings Article

In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Workshop Track, 2024.

Robertson, Jake; Hollmann, Noah; Awad, Noor; Hutter, Frank

FairPFN: Transformers Can do Counterfactual Fairness Conference

Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Workshop Track, 2024.

Maier, Jannis; Möller, Felix; Purucker, Lennart

Hardware Aware Ensemble Selection for Balancing Predictive Accuracy and Cost Proceedings Article

In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Workshop Track, 2024.

Salinas, David; Erickson, Nick

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications Proceedings Article

In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), ABCD Track, 2024.

Bergman, Eddie; Purucker, Lennart; Hutter, Frank

Don’t Waste Your Time: Early Stopping Cross-Validation Proceedings Article

In: Proceedings of the Third International Conference on Automated Machine Learning (AutoML 2024), Methods Track, 2024.

Bergman, Edward; Feurer, Matthias; Bahram, Aron; Balef, Amir Rezaei; Purucker, Lennart; Segel, Sarah; Lindauer, Marius; Hutter, Frank; Eggensperger, Katharina

AMLTK: A Modular AutoML Toolkit in Python Journal Article

In: Journal of Open Source Software, vol. 9, no. 100, pp. 6367, 2024.

2023

Hollmann, Noah; Müller, Samuel; Hutter, Frank

Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering Proceedings Article

In: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.

Purucker, Lennart; Beel, Joeran

CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure Conference

AutoML Conference 2023, 2023.

Purucker, Lennart; Schneider, Lennart; Anastacio, Marie; Beel, Joeran; Bischl, Bernd; Hoos, Holger

Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML Conference

AutoML Conference 2023, 2023.

Hollmann, Noah; Müller, Samuel; Eggensperger, Katharina; Hutter, Frank

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second Proceedings Article

In: The Eleventh International Conference on Learning Representations (ICLR), 2023, ( top-25% of accepted papers ).

2022

Feurer, Matthias; Eggensperger, Katharina; Falkner, Stefan; Lindauer, Marius; Hutter, Frank

Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning Journal Article

In: Journal of Machine Learning Research, vol. 23, no. 261, pp. 1-61, 2022.

Purucker, Lennart; Beel, Joeran

Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML Conference

First Conference on Automated Machine Learning (Late-Breaking Workshop), 2022.

2021

Bischl, Bernd; Casalicchio, Giuseppe; Feurer, Matthias; Gijsbers, Pieter; Hutter, Frank; Lang, Michel; Mantovani, Rafael G; van Rijn, Jan N; Vanschoren, Joaquin

OpenML Benchmarking Suites Proceedings Article

In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.

Kadra, Arlind; Lindauer, Marius; Hutter, Frank; Grabocka, Josif

Well-tuned Simple Nets Excel on Tabular Datasets Proceedings Article

In: Thirty-Fifth Conference on Neural Information Processing Systems, 2021.

Zimmer, Lucas; Lindauer, Marius; Hutter, Frank

Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL Journal Article

In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1-1, 2021.

Feurer, Matthias; van Rijn, Jan N; Kadra, Arlind; Gijsbers, Pieter; Mallik, Neeratyoy; Ravi, Sahithya; Müller, Andreas; Vanschoren, Joaquin; Hutter, Frank

OpenML-Python: an extensible Python API for OpenML Journal Article

In: Journal of Machine Learning Research, vol. 22, no. 100, pp. 1-5, 2021.