Mathematical Methods in Data Science (with Python)

Sebastien Roch, Department of Mathematics, UW-Madison

Description

This textbook on the mathematics of data has two intended audiences:

For students majoring in math (or other quantitative fields like physics, economics, engineering, etc.): it is meant as an invitation to data science and AI from a rigorous mathematical perspective.
For (mathematically-inclined) students in data science related fields (at the undergraduate or graduate level): it can serve as a mathematical companion to machine learning, AI, and statistics courses.

Content-wise it is a second course in linear algebra, multivariable calculus, and probability theory motivated by and illustrated on data science applications. As such, the reader is expected to be familiar with the basics of those areas, as well as to have been exposed to proofs -- but no knowledge of data science is assumed. Moreover, while the emphasis is on mathematical concepts, programming is used throughout. Basic familiarity with Python will suffice. The book provides an introduction to some specialized packages, especially Numpy, NetworkX, and PyTorch.

It is based on Jupyter notebooks that were developed for MATH 535: MATHEMATICAL METHODS IN DATA SCIENCE, a one-semester advanced undergraduate and Master's level course offered at UW-Madison.

A print version of the book will be published by Cambridge University Press.

Course Information (Spring 2025)

Course: MATH 535: Mathematical Methods in Data Science
Instructor: Sebastien Roch
Lectures: MoWeFr 11:00AM - 11:50AM
Location: SOC SCI 6102
Office Hours: TBA
Prerequisites: (MATH 320, 340, 341, 375 or COMP SCI/E C E/M E 532) and (MATH/STAT 309, 431, MATH 531, STAT 311 or E C E 331) and (MATH 322, 341, 375, 421, 467, or COMP SCI 577), graduate/professional standing, or member of Pre-Masters Mathematics (Visiting Intl) Prgrm

Textbook

Textbook: Current version of the full MMiDS book

Links to specific chapters are below, together some additional materials (Jupyter notebooks, datasets, auto-quizzes, etc.). Most of these resources are also available on the GitHub page of the book.

Lecture notes

Lecture notes (based on the textbook) for Spring 2025 follow.

Course Schedule

Online materials

Python package: To run some of the code below, you will need mmids.py.

Chap 1: Introduction

Chap 2: Least squares: geometric, algebraic, and numerical aspects

Chap 3: Optimization theory and algorithms

Chap 4: Singular value decomposition

Chap 5: Spectral graph theory

Chap 6: Probabilistic models: from simple to complex

Chap 7: Random walks on graphs and Markov chains

Chap 8: Neural networks, backpropagation and stochastic gradient descent

Programming languages

Python: I recommend using Google Colaboratory to run the notebooks. Some resources for learning Python:
- A good place to start is this tutorial.
- This textbook has many excellent notebooks about the basics of Numpy, Pandas and matplotlib.
Julia, R, etc.: If you would like to use a different programming language, try converting the code in the notebooks with your favorite AI chatbot.

Additional Reading

The material on this website was partly influenced by the following excellent textbooks.

[Ara] C. Arangala, Linear Algebra With Machine Learning and Data, Chapman & Hall, 2023
[Axl] S. Axler, Linear Algebra Done Right, Springer, 2015
[BHK] A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science, Cambridge University Press, 2020
[Bis] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 (Chaps 2, 8, 9, 13)
[Data8] A. Adhikari, J. DeNero, D. Wagner, Computational and Inferential Thinking: The Foundations of Data Science
[DS100] S. Lau, J. Gonzalez, D. Nolan, Learning Data Science, O'Reilly, 2023
[ISLP] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning: with Applications in Python, Springer, 2023
[MSMB] S. Holmes, W. Huber, Modern Statistics for Modern Biology, Cambridge University Press, 2019
[Nic] B. Nica, A Brief Introduction to Spectral Graph Theory, EMS Textbooks in Mathematics, 2018
[Sol] J. Solomon, Numerical algorithms, CRC Press, 2015 (Chaps 4-7)
[Str] G. Strang, Linear Algebra and Learning from Data, Wellesley-Cambridge Press, 2019
[TB] L. N. Trefethen, D. Bau, III, Numerical Linear Algebra, SIAM, 1997
[VMLS] S. Boyd and L. Vandenberghe. Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge University Press, 2018
[Wri] S. Wright, Optimization Algorithms for Data Analysis, in: The Mathematics of Data, AMS, 2018 (Sections 2-4)
[WR] S. Wright, B. Recht, Optimization for Data Analysis, Cambridge University Press, 2022

Last updated: feb 13, 2025