Neural Architecture Search – Machine Learning Lab

Research Topics & Interests

Fundamental NAS

We develop new methods and benchmarks to advance the field of Neural Architecture Search (NAS). Our goal is to make NAS more efficient at identifying high-performing, novel architectures. Our work covers a range of topics, including gradient-based NAS, LLM compression, and hardware-aware search strategies.

Multi-objective and hardware-aware NAS

Our research tackles the challenge of designing neural architectures that meet multiple, often conflicting objectives—including accuracy, speed, and energy efficiency. With a focus on hardware-aware NAS, we create techniques and benchmarks that expose the trade-offs involved in real-world deployment. This allows for the discovery of models that are not only high-performing but also well-suited to the constraints of diverse computing environments.

LLM Compression using NAS

We explore the use of Neural Architecture Search (NAS) to compress large language models (LLMs) by developing specialized methods tailored to this setting. Rather than manually designing smaller variants, our approach leverages NAS to automatically identify efficient model structures that preserve language understanding and generation capabilities. This line of work addresses the growing need to reduce the size and cost of LLMs, making them more accessible for real-world use.

Ongoing Projects

Analyzing the acceleration step of GDAS with Gumbel-Softmax and ReinMax

Summary: Modern neural architecture search methods use weight-sharing combined with continuous relaxation of the search space to reduce the high computational demands of black-box methods that train each architecture from scratch. However, memory requirements for training a supernet with shared weights increase linearly with the number of candidate operations. To address this, GDAS introduces an acceleration trick by directly searching through the discrete architecture space, which also enhances efficiency. To preserve differentiability, GDAS applies the Straight-Through Estimator with Gumbel-Softmax sampling. ReinMax further extends this approach by introducing second-order accuracy, yielding improved performance on a number of tasks.

This thesis investigates the impact of GDAS’s acceleration trick on hypergradients and model performance, and further explores ReinMax as an alternative to the Straight-Through Gumbel-Softmax method. Additionally, it examines how the acceleration technique influences ReinMax's behavior and efficacy.

Student: Quoc Huy Dang

Gompertz Linear Units: Advancing Activation Function performance in Deep Learning

Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, the most widely used activation function, suffers from the dying ReLU problem. This issue is addressed by variants such as LeakyReLU, PReLU, and ELU, which learn to handle the negative space of neuron values. Furthermore, advanced gated activations such as GELU and Swish integrate effectively with normalization methods, and their smoothness ensures a more stable gradient flow, preventing neuron inactivity. However, these functions fail to reduce latent space variance effectively. Based on insights from automated activation search, this thesis proposes the Gompertz Linear Unit (GoLU), a novel self-gated activation function that leverages the asymmetry in the Gompertz function to effectively reduce variance in the latent space while enhancing gradient flow. Moreover, it compares GoLU with modern state-of-the-art activation functions by conducting extensive experiments across a wide variety of computer vision and language modelling tasks.

Student: Indrashis Das

Tuning LoRA: The Impact of Architectural Decisions

Large language models (LLMs) often require task-specific finetuning to achieve optimal performance, yet fully finetuning (FFT) is computationally and memory intensive. Parameter-efficient finetuning (PEFT) methods, such as LoRA, reduce these costs by introducing low-rank updates to selected parameters. However, LoRA introduces new hyperparameters—such as rank and layer placement—that are typically handcrafted and uniformly applied across all transformer layers, potentially limiting model adaptability. To address this, we propose an efficient framework to systematically explore and evaluate diverse rank configurations, aiming to discover optimal, task-specific ranks and placements for LoRA layers. We design an elastic low-rank supernetwork within the language model and employ a two-stage neural architecture search (NAS): first, training the supernetwork with pre-trained weights frozen; then, performing a black-box search by sampling architectures to assess task-specific configurations. Our approach uncovers configurations that outperform traditional handcrafted settings, highlighting the benefits of adaptive, task-aware LoRA finetuning for LLMs.

Students: Abhash Kumar Jha

Knowledge Distillation in Pretrained Supernetworks

This project focuses on extracting efficient subnetworks (student networks) from a pretrained supernetwork (whittle network) to reduce model size and initiaize a student from a pretrained teacher model. By applying importance sorting (following minitron) to filter and select performant students, we aim to improve the students through different knowledge distillation schemes. We'll evaluate various knowledge distillation techniques, including in-place and layer-wise KD-loss, across different scales of models like Pythia and LLaMA-3.1, integrating LoRA in larger models for improved parameter efficiency. This approach will be benchmarked against randomly initialized student networks to validate its effectiveness.

Student: Zainab Sultan

Whittle

Whittle is a library for Neural Architecture Search (NAS) aimed at compressing pretrained large language models (LLMs). The core idea is to treat the network as a super-network and select sub-networks that optimally balance performance and efficiency. Whittle provides the following functionalities:

Support for a variety of LLMs to define super-networks
Checkpoint generation for sub-networks, enabling deployment
Downstream evaluation of sub-networks using LM-Eval-Harness
A simple interface for various super-network training strategies and multi-objective search methods

Student: Mohsen Mohammed Al-Queri

Multi-objective BO with LLMs

This project focuses on using large language models (LLMs) to guide multi-objective Bayesian Optimization for structural pruning of LLMs. Structural pruning reduces the size and complexity of LLMs by removing less important components, such as layers or neurons, while maintaining performance. Multi-objective Bayesian Optimization helps optimize multiple conflicting objectives, like minimizing model size and computational cost while preserving accuracy. LLMs guide the process by providing insights into which parts of the model are most important, making the optimization more efficient and effective. The goal is to create smaller, faster, and resource-efficient LLMs without significant performance loss.

Students: Andrej Schwanke and Lyobomir Ivanov

Searching the Space of One-Shot Optimizers

This project aims to modularize the orthogonal components of various gradient-based one-shot neural architecture search (NAS) methods. By doing so, the optimizer can be represented as a search space comprising several interchangeable components. This enables the use of black-box optimization to identify the best optimizer configuration for different scenarios, such as limited computational resources or specific task requirements.

Student: Abhash Kumar Jha