COMP6258: Differentiable Programming and Deep Learning

2024-25

Maintained by Professor Jonathon Hare and Dr Antonia Marcu.

Welcome

Welcome to the homepage for the ECS COMP6258 Differentiable Programming and Deep Learning module.

Differentiable Programming and Deep learning has revolutionised numerous fields in recent years. We’ve witnessed improvements in everything from computer vision through speech analysis to natural language processing as a result of the advent of cheap GPGPU compute coupled with large datasets and some neat algorithms. More broadly, the idea of ‘Differentiable Programming’, in which we define entire programs as compositions of differentiable operations which can then be optimised to fit data, looks to become a new norm in how we utilise computers.

This module will look at how deep learning and differentiable programming works, from theoretical foundations right through to practical implementation. We’ll study key aspects such as automatic differentiation, look at models for deep learning such as convolutional and recurrent neural networks and `transformer’ architectures, as well as considering current research in depth. Along the way we’ll also look at aspects of biology and neuroscience, and see how ideas from these fields feed-in to current research.

The overall aim of this module is not to teach you to be able to train pre-existing models (although you will learn to do that!), but rather to equip you with the fundamental skills to be able to understand and implement models and ideas that are currently being developed by researchers. We intend to equip you with the knowledge needed to understand new ideas as they are published, and give you the ability to constructively criticise, and identify limitations, of different approaches.

As a word of warning, this is a mathematical module: the predominant focus is on looking at models that can be optimised via gradient methods. You need to have a good grasp of linear (matrix) algebra and matrix calculus, as well as the fundamentals of machine learning, probability and statistics. You will also necessarily be comfortable with Python programming and the use of numeric/matrix libraries such as numpy or pytorch. As such, the Foundations of Machine Learning module is a prerequisite. You’ll also be expected to read and try to understand scientific papers along the way.

Lectures and assigned reading

The course will be delivered by Professor Jonathon Hare (email) and Dr Antonia Marcu (email). We have a capable team of our PhD students to facilitate the lab sessions and run some of our guest lectures.

There will be three lectures each week: Tuesdays at 9AM and Fridays at 9AM and 1PM. Labs take place for 8 weeks, starting in week 1, from 11AM - 1PM on Fridays in Zepler L3. The lectures and labs will all take place in person.

By taking part in this module we expect you to turn up to the lectures and get involved - asking questions and provoking discussion is positively encouraged. Expect us to use a range of approaches to get you asking questions - we’ll even run some of the lectures as double-acts between us to help foster debate. Some of the lecture slots will be used for “seminars” where will discuss and work through a scientific paper in detail; you will need to prepare for these by reading the paper(s) carefully in advance. For the seminars we have provided a list of questions to consider here. These questions will also help you with the coursework assignment. Some of the slots will be used for a series of guest lectures covering a range of topics.

The current working timetable/plan is below, and illustrates the topics we intend to cover, but this will evolve as the course progresses. Many of the lectures are coupled with assigned reading materials that you should read before the lecture takes place. This will broaden your understanding of the topic whilst giving you the skills required to read and understand the key points from recent research literature. The lectures are approximately broken into three groups: fundamentals (weeks 1-3), architectures/models (weeks 4-8), and advanced topics (weeks 9-12).

Week	Date	Location	Topic	Slides	Slides (2 per page)	Handouts	Reading Material
1	28-Jan	27/2003	Lecture: Introduction to the module, coursework, labs & quizzes.	slides	slides-2per	handout
	31-Jan	100/4011	Lecture: Review of fundamentals	slides	slides-2per	handout	CH 3 of Michael Nielsen’s Book
	31-Jan	07/3027	Lecture: Differentiable Programming: How does pre-university calculus relate to AI and the future of computer programming?	slides	slides-2per	handout	Chapter 1 of Jon’s unfinished book
2	04-Feb	27/2003	Lecture: The Power of Differentiation	slides	slides-2per	handout	Chapter 3 of Jon’s unfinished book
	07-Feb	100/4011	Lecture: Automatic Differentiation	slides	slides-2per	handout	Automatic differentiation in PyTorch
	07-Feb	07/3027	Lecture: Backpropagation	slides	slides-2per	handout	Learning representations by back-propagating errors
3	11-Feb	27/2004	Lecture: Optimisation	slides	slides-2per	handout	Adam: A Method for Stochastic Optimization
	14-Feb	100/4012	Lecture: Going Deep: Universal approximation, overfitting and regularisation	slides	slides-2per	handout	Dropout:A Simple Way to Prevent Neural Networks from Overfitting
	14-Feb	07/3027	Lecture: Convolutional Networks	slides	slides-2per	handout	handwritten digit recognition with a back-propagation network
4	18-Feb	27/2004	Lecture: Networks Architectures for image classification	slides	slides-2per	handout	ImageNet Classification with Deep Convolutional Neural Networks, Striving for Simplicity: The All Convolutional Net, Very Deep Convolutional Networks for Large-Scale Image Recognition, Going Deeper with Convolutions, Deep Residual Learning for Image Recognition
	21-Feb	100/4012	Seminar: Shape and Texture Bias (and a discussion about experimental design)				ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
	21-Feb	07/3027	Lecture: Networks Architectures for image classification (II)	slides	slides-2per	handout
5	25-Feb	27/2005	Lecture: Embeddings	slides	slides-2per	handout	Efficient Estimation of Word Representations in Vector Space
	28-Feb	100/4013	Lecture: Recurrent Neural Networks	slides	slides-2per	handout	The Unreasonable Effectiveness of Recurrent Neural Networks
	28-Feb	07/3027	Lecture: LSTMs and GRUs	slides	slides-2per	handout	Recurrent Neural Network Regularization
6	04-Mar	27/2005	Seminar: Were RNNs All We Needed?	slides			Were RNNs All We Needed?
	07-Mar	100/4013	Lecture: Auto-encoders, unsupervised learning and self-supervision	slides	slides-2per	handout	Blog Post on Autoencoders
	07-Mar	07/3027	Lecture: SSL, auto-regressive modelling, augmentation	slides			Barlow Twins, A Simple Framework for Contrastive Learning of Visual Representations, Masked Autoencoders Are Scalable Vision Learners
7	11-Mar	27/2006	Lecture: Differentiable relaxations (sampling, etc.)	slides	slides-2per	handout
	14-Mar	100/4014	Lecture: Perspectives on Learning	slides
	14-Mar	07/3027	Lecture: Generative Models Part 1: Differentiable Generator Networks	slides	slides-2per	handout
8	18-Mar	27/2006	Lecture: Generative Models Part 2: Variational Autoencoders	slides	slides-2per	handout	Autoencoding Variational Bayes
	21-Mar	100/4014	Lecture: Generative Models Part 3: Generative Adversarial Networks	slides	slides-2per	handout	GANs, DCGANs
	21-Mar	07/3027	Lecture: Diffusion Models	slides
9	25-Mar	27/2007	Lecture: Attention.	slides	slides-2per	handout	Attention Is All You Need
	28-Mar	100/4015	Seminar: FID Score and measuring what is good				The Role of ImageNet Classes in Fréchet Inception Distance
	28-Mar	07/3027	Lecture: More on the Transformer				Attention Is All You Need, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
10	29-Apr	27/2007	Lecture: Mechanistic Interpretability	slides
	02-May	100/4015	Lecture: Interesting phenomena in learning	slides		The Implicit Bias of Gradient Descent on Separable Data, Gradient Starvation: A Learning Proclivity in Neural Networks
	02-May	07/3027	Guest Lecture: Audio models
11	06-May	27/2008	Lecture: Set prediction	slides			Featurewise Sort Pooling, Deep Set Prediction Networks
	09-May	100/4016	Lecture: Implict Models and Test time compute	slides			Rethinking Deep Thinking: Stable Learning of Algorithms using Lipschitz Constraints
	09-May	07/3027	Guest Lecture: Graph Networks
12	13-May	27/2008	Seminar: What else? I’m a Deep Learner AMA
	16-May	100/4016	TBC: Reserved for special requests
	16-May	07/3027	TBC: Reserved for special requests

Assorted topic lectures

These are bonus lectures/talks on topics that were requested by students in previous years that you can watch. If there are additional topics that you would like covered, then let us know.

Topic	Description	Handouts/slides	Video
Distributed Learning	How can you distribute large models and data over many machines? This is a huge topic, but I made two lectures for advanced machine learning on it (which I’ve also made available here in case you’re not taking it) which cover the basics of both the hardware bottlenecks and the software mitigations to these bottlenecks.	Interactive slides and handouts	Part 1 Part 2
Attention is (possibly) all you need	Recent trends, particularly in models for mining textual data, have used “attentional” mechanisms to get breakthrough performance and move away from recurrent networks; what is this attention and how does it work?		link
Neural architecture search	A few people have asked how you design a network architecture; that’s quite a difficult question as it relies on a lot of intuition (possibly with some inspiration from biology) and trial & error. There is an alternative though… Why not let the network design itself? There are a number of approaches to what is called Neural Architecture Search, but most use horribly inefficient Reinforcement Learning, so we’ll just take a little look at a nifty differentiable approach called “DARTS”.		link
Hardware Considerations	Deep networks typically require power-hungry hardware and lots of memory. Can you reduce the requirements and optimise for lower-powered hardware?		link

Labs

For 8 of the weeks we are organising a 2-hour lab session in which you will need to complete a series of worksheets. The worksheets have been designed to put the theory covered in the lectures into context, and the equip you with practical skills in implementing and training differentiable programs. A team of PhD-student demonstrators will be available in the lab to help you with any questions you might have about the topics you are working on.

40% of the marks for the module are for lab work. Each of the 8 lab sessions will be accompanied by an additional assessed exercise for you to work through in your own time. You will have to work through the exercises by yourself and succinctly write-up your findings. You will submit your answers/findings/working to all the assessed exercises to handin in week 11 for marking (7th May, 16:00). Each of the 8 exercises will be worth 5% of your overall module mark. We recommend that you do the exercise accompanying the lab as soon as possible after the lab session, rather than leaving them all to the end.

Labs will start in the first week (31st Jan) 11-1 on Fridays. The labs take place physically in a computer room (Zepler L3 labs) with the demonstrator team and Jon & Antonia. The demonstrators can offer advice on both the labs as well as the group coursework, however you should not ask them about the assessed lab exercises that you complete after the lab.

The full lab schedule is below:

Week	Date	Location	Topic	Exercise Link
1	31-Jan	Zepler L3	Introducing PyTorch	Lab 1 Exercise
2	07-Feb	Zepler L3	Automatic Differentiation	Lab 2 Exercise
3	14-Feb	Zepler L3	Optimisation	Lab 3 Exercise
4	21-Feb	Zepler L3	Implementing simple Neural Networks using PyTorch and Torchbearer	Lab 4 Exercise
5	28-Feb	Zepler L3	Implementing and training Convolutional Neural Networks using PyTorch and Torchbearer	Lab 5 Exercise
6	07-Mar	Zepler L3	Using pretrained models and transfer learning	Lab 6 Exercise
7	14-Mar	Zepler L3	Recurrent Networks, Sequence Prediction and Embeddings	Lab 7 Exercise
8	21-Mar	Zepler L3	Autoencoders and Deep Generative Models	Lab 8 Exercise
9	28-Mar	Zepler L3	(catch-up / questions)
10	02-May	NO LAB
11	09-May	NO LAB
12	16-May	NO LAB

Note: I’ve made all the worksheet links available from last year. Please don’t be surprised if we make some updates before each session! We’re actively updating the assessed exercises and will release these nearer the time.

Online Quizzes

There will be two assessed online-quizzes; We are planning for these to be on the 5th March and 14th May. These will be available on blackboard for a 24 hour period and once started you must complete them within one hour. The quizzes must be taken independently by yourself and you should not share questions/answers with others.

Coursework assignment

Information on the coursework assignment (worth 40% of the module) is here.

Where to get additional help

Talk to us! You are more than welcome to arrange to meet to discuss issues related to the course during lab sessions or by appointment. The lab sessions are also facilitated by a team of our PhD students who are experts in the deep learning / differentiable programming field in their own right (many of them have published work in this space, or are close to achieving that). We can be reached by teams or Jon’s email or Antonia’s email.