Menu

Neural Architecture Search

Research Topics & Interests

Topic

TODO

Topic

TODO

Topic

TODO

Ongoing Projects

Analyzing the acceleration step of GDAS with Gumbel-Softmax and ReinMax

Summary: Modern neural architecture search methods use weight-sharing combined with continuous relaxation of the search space to reduce the high computational demands of black-box methods that train each architecture from scratch. However, memory requirements for training a supernet with shared weights increase linearly with the number of candidate operations. To address this, GDAS introduces an acceleration trick by directly searching through the discrete architecture space, which also enhances efficiency. To preserve differentiability, GDAS applies the Straight-Through Estimator with Gumbel-Softmax sampling. ReinMax further extends this approach by introducing second-order accuracy, yielding improved performance on a number of tasks.

This thesis investigates the impact of GDAS’s acceleration trick on hypergradients and model performance, and further explores ReinMax as an alternative to the Straight-Through Gumbel-Softmax method. Additionally, it examines how the acceleration technique influences ReinMax's behavior and efficacy.

Student: Quoc Huy Dang

Gompertz Linear Units: Advancing Activation Function performance in Deep Learning

Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, the most widely used activation function, suffers from the dying ReLU problem. This issue is addressed by variants such as LeakyReLU, PReLU, and ELU, which learn to handle the negative space of neuron values. Furthermore, advanced gated activations such as GELU and Swish integrate effectively with normalization methods, and their smoothness ensures a more stable gradient flow, preventing neuron inactivity. However, these functions fail to reduce latent space variance effectively. Based on insights from automated activation search, this thesis proposes the Gompertz Linear Unit (GoLU), a novel self-gated activation function that leverages the asymmetry in the Gompertz function to effectively reduce variance in the latent space while enhancing gradient flow. Moreover, it compares GoLU with modern state-of-the-art activation functions by conducting extensive experiments across a wide variety of computer vision and language modelling tasks.

Student: Indrashis Das

Tuning LoRA: The Impact of Architectural Decisions

Large language models (LLMs) often require task-specific finetuning to achieve optimal performance, yet fully finetuning (FFT) is computationally and memory intensive. Parameter-efficient finetuning (PEFT) methods, such as LoRA, reduce these costs by introducing low-rank updates to selected parameters. However, LoRA introduces new hyperparameters—such as rank and layer placement—that are typically handcrafted and uniformly applied across all transformer layers, potentially limiting model adaptability. To address this, we propose an efficient framework to systematically explore and evaluate diverse rank configurations, aiming to discover optimal, task-specific ranks and placements for LoRA layers. We design an elastic low-rank supernetwork within the language model and employ a two-stage neural architecture search (NAS): first, training the supernetwork with pre-trained weights frozen; then, performing a black-box search by sampling architectures to assess task-specific configurations. Our approach uncovers configurations that outperform traditional handcrafted settings, highlighting the benefits of adaptive, task-aware LoRA finetuning for LLMs.

Students: Abhash Kumar Jha

Knowledge Distillation in Pretrained Supernetworks

This project focuses on extracting efficient subnetworks (student networks) from a pretrained supernetwork (whittle network) to reduce model size and initiaize a student from a pretrained teacher model. By applying importance sorting (following minitron) to filter and select performant students, we aim to improve the students through different knowledge distillation schemes. We'll evaluate various knowledge distillation techniques, including in-place and layer-wise KD-loss, across different scales of models like Pythia and LLaMA-3.1, integrating LoRA in larger models for improved parameter efficiency. This approach will be benchmarked against randomly initialized student networks to validate its effectiveness.

Student: Zainab Sultan

Whittle

Whittle is a library for Neural Architecture Search (NAS) aimed at compressing pretrained large language models (LLMs). The core idea is to treat the network as a super-network and select sub-networks that optimally balance performance and efficiency. Whittle provides the following functionalities:

  1. Support for a variety of LLMs to define super-networks
  2. Checkpoint generation for sub-networks, enabling deployment
  3. Downstream evaluation of sub-networks using LM-Eval-Harness
  4. A simple interface for various super-network training strategies and multi-objective search methods

Student: Mohsen Mohammed Al-Queri

Multi-objective BO with LLMs

This project focuses on using large language models (LLMs) to guide multi-objective Bayesian Optimization for structural pruning of LLMs. Structural pruning reduces the size and complexity of LLMs by removing less important components, such as layers or neurons, while maintaining performance. Multi-objective Bayesian Optimization helps optimize multiple conflicting objectives, like minimizing model size and computational cost while preserving accuracy. LLMs guide the process by providing insights into which parts of the model are most important, making the optimization more efficient and effective. The goal is to create smaller, faster, and resource-efficient LLMs without significant performance loss.

Students: Andrej Schwanke and Lyobomir Ivanov

Searching the Space of One-Shot Optimizers

This project aims to modularize the orthogonal components of various gradient-based one-shot neural architecture search (NAS) methods. By doing so, the optimizer can be represented as a search space comprising several interchangeable components. This enables the use of black-box optimization to identify the best optimizer configuration for different scenarios, such as limited computational resources or specific task requirements.

Student: Abhash Kumar Jha

Open Projects & Thesis Topics

Unfortunately, at the current Moment we can not offer any RL specific projects or thesis. Please refer to https://ml.informatik.uni-freiburg.de/student/ for other open topics.

Members

Postdoctoral Research Fellows

PhD Students

Research Engineers

Students

Mohsen Mohammed Al-Queri

Quoc Huy Dang

Indrashis Das

Abhash Kumar Jha

Lyubomir Ivanov

Shakiba Moradian

Andrej Schwanke

Zainab Sultan

Alumni

Publications

2024

Sukthanker, Rhea Sanjay; Staffler, Benedikt; Hutter, Frank; Klein, Aaron

Large Language Model Compression with Neural Architecture Search Inproceedings

In: NeurIPS 2024 Workshop on Machine Learning and Compression, 2024.