Please note: This page is under currently under construction and all info listed is subject to change.
Foundations of Deep Learning
Course type: | Lecture + Exercise |
Time: | Lecture: Tuesday, 10:15 - 11:45; Optional exercises: Friday, 10:15 - 11:45 |
Location: | The course will be in-person. - Weekly flipped classroom sessions will be held on Tuesday in HS 00 006 (G.-Köhler-Allee 082) - Optional exercise sessions will take place on Friday in HS 00 006 (G.-Köhler-Allee 082) |
Organizers: | Steven Adriaensen , Abhinav Valada , Mahmoud Safari , Rhea Sukthanker , Johannes Hog |
Web page: | ILIAS - under construction (please make sure to also register for all elements of this course module in HISinOne) |
Foundations of Deep Learning
Deep learning is one of the fastest growing and most exciting fields. This course will provide you with a clear understanding of the fundamentals of deep learning including the foundations to neural network architectures and learning techniques, and everything in between.
Course Overview
The course will be taught in English and will follow a flipped classroom approach.
Every week there will be:
- a video lecture
- an exercise sheet
- a flipped classroom session (Tuesdays, 10:15 - 11:45)
- an attendance optional exercise session (Fridays, 10:15 - 11:45)
At the end, there will be a written exam (likely an ILIAS test).
Exercises must be completed in groups and must be submitted 2 weeks (+ 1 day) after their release.
Your submissions will be graded and you will receive weekly feedback.
Your final grade will be solely based on a written examination, however, a passing grade for the exercises is a prerequisite for passing the course.
Course Material: All material will be made available in ILIAS and course participation will not require in-person presence. That being said, we offer ample opportunity for direct interaction with the professors during live Q & A sessions and with our tutors during weekly attendance optional in-class exercise sessions.
Exam: The exam will likely be a test you complete on ILIAS. In-person presence will be required .
Course Schedule
The following are the dates for the flipped classroom sessions (tentative, subject to change):
15.10.23 - Kickoff: Info Course Organisation
22.10.24 - Week 1: Intro to Deep Learning
29.10.24 - Week 2: From Logistic Regression to MLPs
5.11.24 - Week 3: Backpropagation
12.11.24 - Week 4: Optimization
19.11.24 - Week 5: Regularization
26.11.24 - Week 6: Convolutional Neural Networks (CNNs)
03.12.24 - Week 7: Recurrent Neural Networks (RNNs)
10.12.24 - Week 8: Attention & Transformers
17.12.24 - Week 9: Practical Methodology
07.01.25 - Week 10: Auto - Encoders, Variational Auto - Encoders, GANs
14.01.25 - Week 11: Uncertainty in Deep Learning
21.01.25 - Week 12: AutoML for DL
28.01.25 - Round - up / Exam Q & A
In the first session (on 15.10.24) you will get additional information about the course and get the opportunity to ask general questions. While there is no need to prepare for this first session, we encourage you to already think about forming teams.
The last flipped classroom session will be held on 28.01.25.
Questions?
If you have a question, please post it in the ILIAS forum (so everyone can benefit from the answer).
Seminar: Automated Reinforcement Learning
Course type: | Block Seminar |
Time: | Kickoff Session: 17.10.24 14:00 - 16:00 Presentation Sessions: TBD (likely first week of February) |
Location: | Kickoff Session: R 00 017 (G.-Köhler-Allee 101) Presentation Sessions: TBD |
Organizers: | André Biedenkapp , Noor Awad , Raghu Rajan , M Asif Hasan , Baohe Zhang |
Web page: | HISinOne , Local Page |
Seminar: Large Language Models, Deep Learning, and Foundation Models for Tabular Data
The field of tabular das has recently been exploding with advances through large language models (LLMs), deep learning algorithms, and foundation models. In this seminar, we want to dive deep into these very recent advances to understand them.
Course type: | Seminar |
Time | Five slots, to be determined with all participants. Kick-off is likely on the 23rd of October at 10 to 11 am. |
Location | in-person; Meeting Room in our ML Lab |
Organizers | Lennart Purucker |
Registration | Via HISinOne (maximal six students, registration opens 14th of October) |
Language | English |
Prerequisites
We require that you have taken lectures on or are familiar with the following:
- Machine Learning
- Deep Learning
- Automated Machine Learning
Organization
After the kick-off meeting, everyone is assigned a paper about recent advances in deep learning (one or multiple papers, depending on the content). Then, everyone is expected to understand and digest their assigned papers and prepare two presentations. The first presentation is given in midterms (two separate slots), and the second during the endterms (two separate slots).
- The first presentation will focus on the relationship between the papers, any relevant related work, any background to understand the paper, and the greater context of the work.
- The second presentation will focus on the paper's contributions, describing them in detail.
In addition to the second presentation, students are expected to contribute an "add-on" related to the paper for the final report. This includes but is not limited to reproducing some experiments, implementing a part of the paper, providing a greater literature survey, fact-checking citations, experiments, or methodology, building a WebUI or demo for the paper, etc. Students can (e-)meet with Lennart Purucker for feedback and any questions (e.g., to discuss a potential "add-on").
Grading
- Presentations: 40% (two times 20min + 20min Q&A)
- Report: 40% (4 pages in AutoML Conf format, due one week after last end term)
- Add-on: 20%
Short(er) List of Potential Papers / Directions:
LLMs
- https://arxiv.org/abs/2409.03946
- https://arxiv.org/abs/2403.20208
- https://arxiv.org/abs/2404.00401 , https://aclanthology.org/2024.lrec-main.1179/ , https://arxiv.org/abs/2408.09174
- https://arxiv.org/abs/2404.05047
- https://arxiv.org/abs/2404.17136
- https://arxiv.org/abs/2404.18681 , https://arxiv.org/abs/2405.17712 , https://arxiv.org/abs/2406.08527
- https://arxiv.org/abs/2405.01585
- https://arxiv.org/abs/2407.02694
- https://arxiv.org/abs/2408.08841
- https://arxiv.org/abs/2408.11063
- https://arxiv.org/abs/2403.19318
- https://arxiv.org/abs/2403.06644
- https://arxiv.org/abs/2402.17453 , https://arxiv.org/abs/2409.07703
- https://arxiv.org/abs/2403.01841
Deep Learning
- https://arxiv.org/abs/2405.08403
- https://arxiv.org/abs/2307.14338
- https://arxiv.org/abs/2305.06090 , https://arxiv.org/abs/2406.00281
- https://arxiv.org/pdf/2404.17489
- https://arxiv.org/abs/2405.14018 , https://arxiv.org/abs/2406.05216 , https://arxiv.org/abs/2406.17673 , https://arxiv.org/abs/2409.05215 , https://arxiv.org/abs/2406.14841
- https://arxiv.org/abs/2408.06291
- https://arxiv.org/abs/2408.07661
- https://arxiv.org/abs/2409.08806
- https://arxiv.org/abs/2404.00776
Foundation Models / In-Context Learning
- https://arxiv.org/abs/2406.09837 , https://arxiv.org/abs/2307.09249
- https://arxiv.org/abs/2406.06891 ,https://arxiv.org/abs/2408.17162
- https://arxiv.org/abs/2405.16156 , https://arxiv.org/abs/2402.11137 , https://arxiv.org/pdf/2406.05207
- https://arxiv.org/abs/2403.05681 ,https://arxiv.org/abs/2408.02927
- https://arxiv.org/abs/2407.21523
Multi-Modal
- https://arxiv.org/abs/2404.16233
- https://arxiv.org/pdf/2403.13319
- https://arxiv.org/abs/2402.16785
- https://arxiv.org/pdf/2408.00665
- https://arxiv.org/abs/2402.14926
Seminar: Pruning and Efficiency in LLMs
Large language models (LLMs) exhibit remarkable reasoning abilities, allowing them to generalize across a wide range of downstream tasks, such as commonsense reasoning or instruction following. However, as LLMs scale, inference costs become increasingly prohibitive, accumulating significantly over their life cycle. In this seminar we will dive into methods like quantization, pruning and knowledge distillation to optimize LLM inference.
Course type: | Seminar |
Time | Five slots, to be determined with all participants. Kick-off is likely on the 24th of October at 10 to 11 am. |
Location | in-person; Meeting Room in the ML Lab |
Organizers | Rhea Sukthanker |
Registration | Via HISinOne (maximum six students, registration opens 14th of October) |
Language | English |
Prerequisite
We require that you have taken lectures on or are familiar with the following:
- Machine Learning
- Deep Learning
- Automated Machine Learning
Organization
After the kick-off meeting, everyone is assigned a paper (one or two depending on the content). Then, everyone understands the paper(s) assigned to them and prepares two presentations.
- The first presentation will focus on establishing, the background, motivation for the work and a concise overview of the approach proposed in the paper
- The second presentation will focus on the details of the approach, the results and takeaways from the paper and an “add-on” described below
Students will contribute an "add-on" related to the paper for the final report. This includes but is not limited to reproducing some experiments, profiling inference latency of the LLMs, implementing a part of the paper or providing a colab demo on applying the method in the paper to a different LLM. Students can (e-)meet with Rhea Sukthanker for feedback and any questions (e.g., to discuss a potential "add-on").
Grading
- Presentations: 40% (two times 20min + 20min Q&A)
- Report: 40% (4 pages in AutoML Conf format, due one week after last end term)
- Add-on: 20%
List of Potential Papers
- Are Sixteen Heads Really Better than One? https://arxiv.org/pdf/1905.10650.pdf
- FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search https://arxiv.org/abs/2308.03290
- Minitron https://www.arxiv.org/abs/2407.14679
- MiniLLM https://openreview.net/pdf?id=5h0qf7IBZZ
- Compressing LLMs: The Truth is Rarely Pure and Never Simple https://arxiv.org/abs/2310.01382
- Wanda : https://arxiv.org/pdf/2306.11695
- SparseGPT: https://arxiv.org/abs/2301.00774
- On the Effect of Dropping Layers of Pre-trained Transformer Models https://arxiv.org/pdf/2004.03844.pdf
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned https://arxiv.org/pdf/1905.09418.pdf
- A Fast Post-Training Pruning Framework for Transformers https://proceedings.neurips.cc/paper_files/paper/2022/file/987bed997ab668f91c822a09bce3ea12-Paper-Conference.pdf
- LLM-Pruner: On the Structural Pruning of Large Language Models https://arxiv.org/pdf/2305.11627.pdf
- Compresso https://arxiv.org/pdf/2310.05015.pdf
- LLM Surgeon https://arxiv.org/pdf/2312.17244.pdf
- Shortened Llama https://arxiv.org/abs/2402.02834
- SliceGPT https://arxiv.org/abs/2401.15024
- Structural pruning of large language models via neural architecture search https://arxiv.org/abs/2405.02267
- Not all Layers of LLMs are Necessary during Inference https://arxiv.org/pdf/2403.02181.pdf
- ShortGPT: Layers in Large Language Models are More Redundant Than You Expect https://arxiv.org/abs/2403.03853
- Shortened Llama https://arxiv.org/abs/2402.02834
- FLAP: Fluctuation-based adaptive structured pruning for large language models https://arxiv.org/abs/2312.11983
- Bonsai: Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes https://arxiv.org/pdf/2402.05406.pdf
- The Unreasonable Ineffectiveness of the Deeper Layers https://arxiv.org/pdf/2403.17887v1.pdf
- Sheared Llama https://arxiv.org/abs/2310.06694
- Netprune https://arxiv.org/pdf/2402.09773.pdf