Quicklinks: Back to my homepage


Mathematical Methods in Data Science (with Python)

Sebastien Roch, Department of Mathematics, UW-Madison

Description

This textbook on the mathematics of data has two intended audiences:

Content-wise it is a second course in linear algebra, multivariable calculus, and probability theory motivated by and illustrated on data science applications. As such, the reader is expected to be familiar with the basics of those areas, as well as to have been exposed to proofs -- but no knowledge of data science is assumed. Moreover, while the emphasis is on mathematical concepts, programming is used throughout. Basic familiarity with Python will suffice. The book provides an introduction to some specialized packages, especially Numpy, NetworkX, and PyTorch.

It is based on Jupyter notebooks that were developed for MATH 535: MATHEMATICAL METHODS IN DATA SCIENCE, a one-semester advanced undergraduate and Master's level course offered at UW-Madison.

A print version of the book will be published by Cambridge University Press.

Online book and Jupyter notebooks

Textbook: Current version of the full MMiDS book

Links to specific chapters are below, together some additional materials (assignments, Jupyter notebooks, datasets, auto-quizzes, etc.). Most of these resources are also available on the GitHub page of the book.

Exercises: Assignments and practice exams for Spring 2024 follow.

Python package: To run some of the code below, you will need mmids.py.

Chap 1: Introduction

Chap 2: Least squares: geometric, algebraic, and numerical aspects

Chap 3: Optimization theory and algorithms

Chap 4: Singular value decomposition

Chap 5: Spectral graph theory

Chap 6: Probabilistic models: from simple to complex

Chap 7: Random walks on graphs and Markov chains

Chap 8: Neural networks, backpropagation and stochastic gradient descent

Programming languages

Additional Reading

The material on this website was partly influenced by the following excellent textbooks.



Last updated: july 17, 2024