introduction

algorithms

machine learning algorithms
curse of high dimensions and dimensionality reduction
random walks and Markov-Chain-Monte-Carlo method
algorithms for massively parallel architectures
algorithms for streaming data

mathematical methods

linear algebra, discrete mathematics, probability theory
analysis of algorithms
complexity theory
information theory

overview

ML algorithms for classification and clustering
foundations of information theory
statistical learning theory
- how well can we learn and generalize from finite data set?
multiplicative weights
- learning algorithms for data processed online. Exploration vs. Exploitation
high dimensional data
- probability and geometry in high dimensions
- dimensionality reduction and spectral clustering
Markov Chains and random walks
- Markov Chain Monte Carlo
- How can we (use Markov chains to) sample from any distribution?
algorithms for parallel systems
- compute clusters, map reduce, and map reduce algorithms
- How can we implement algorithms on clusters/parallel machines?
streaming algorithms
- stream data model, hashing, streaming algorithms
- how can we process data that arrives as stream and cannot be stored in memory?

textbooks

foundations of data science A. Blum, J. Hopcroft, R. Kannan.
mining of massive datasets J. Leskovec, A. Rajaraman, J. Ullman
understanding machine learning Shalev-Shwartz

multiplicative weight updates

goals

basic algorithm and its analysis using suitable potential functions
adapted to different scenarios

The MWU algorithm

Boosting weak learning algorithms

Bandit learning

high-dimensional data

2 goals

advanced mathematical concepts from geometry and linear algebra (eigenvalues and eigenvectors)
dimensionality reduction

the strange geometry of high-dimensional spaces

The volume is near the surface

high-dimensional unit balls

volume of the unit ball

concentration near equator

a probabilistic view on the equator

concentration near the equator
think of drawing a point uniformly at random

a probabilistic view on the volume

Dimension Reduction by Random Projections

reduction of dimensions while preserving the distances between all pairs of points

Johnson-Lindenstrauss Lemma

Background from linear algebra: eigenvalues and eigenvectors

power iteration

The Monte Carlo Method

lemma 8.9

if $\pi_{i}q_{ij} = \pi_{j}q_{ji}$

then $\mathbf{\pi}$ is the stationary distribution of the Markove chain.

proof: summing both sides of the equation

$\sum_{j}^{n}\pi_{i}q_{ij} = \sum_{j}^{n}\pi_{i}q_{ji}$

$\pi_{i}\sum_{j}^{n}q_{ij} = \sum_{j}^{n}\pi_{i}q_{ji}$

$\pi_{i} = \sum_{j}^{n}\pi_{i}q_{ji}$

$\pi_{i} = (\mathbf{\pi}Q)_i$

The above so-called detailed balance condition assumes the Markov chain is reversible, i.e., the probability flow from state $i$ to $j$ is the same as from $j$ to $i$.

However, a stationary distribution may exist and not be of the above form.

Algorithmic foundations of data science