introduction
algorithms
- machine learning algorithms
- curse of high dimensions and dimensionality reduction
- random walks and Markov-Chain-Monte-Carlo method
- algorithms for massively parallel architectures
- algorithms for streaming data
mathematical methods
- linear algebra, discrete mathematics, probability theory
- analysis of algorithms
- complexity theory
- information theory
overview
- ML algorithms for classification and clustering
- foundations of information theory
- statistical learning theory
- how well can we learn and generalize from finite data set?
- multiplicative weights
- learning algorithms for data processed online. Exploration vs. Exploitation
- high dimensional data
- probability and geometry in high dimensions
- dimensionality reduction and spectral clustering
- Markov Chains and random walks
- Markov Chain Monte Carlo
- How can we (use Markov chains to) sample from any distribution?
- algorithms for parallel systems
- compute clusters, map reduce, and map reduce algorithms
- How can we implement algorithms on clusters/parallel machines?
- streaming algorithms
- stream data model, hashing, streaming algorithms
- how can we process data that arrives as stream and cannot be stored in memory?
textbooks
- foundations of data science A. Blum, J. Hopcroft, R. Kannan.
- mining of massive datasets J. Leskovec, A. Rajaraman, J. Ullman
- understanding machine learning Shalev-Shwartz
multiplicative weight updates
goals
- basic algorithm and its analysis using suitable potential functions
- adapted to different scenarios
The MWU algorithm
Boosting weak learning algorithms
Bandit learning
high-dimensional data
2 goals
- advanced mathematical concepts from geometry and linear algebra (eigenvalues and eigenvectors)
- dimensionality reduction
the strange geometry of high-dimensional spaces
The volume is near the surface
high-dimensional unit balls
volume of the unit ball
concentration near equator
a probabilistic view on the equator
- concentration near the equator
- think of drawing a point uniformly at random
a probabilistic view on the volume
Dimension Reduction by Random Projections
reduction of dimensions while preserving the distances between all pairs of points
Johnson-Lindenstrauss Lemma
Background from linear algebra: eigenvalues and eigenvectors
power iteration
The Monte Carlo Method
lemma 8.9
if $\pi_{i}q_{ij} = \pi_{j}q_{ji}$
then $\mathbf{\pi}$ is the stationary distribution of the Markove chain.
proof: summing both sides of the equation
$\sum_{j}^{n}\pi_{i}q_{ij} = \sum_{j}^{n}\pi_{i}q_{ji}$
$\pi_{i}\sum_{j}^{n}q_{ij} = \sum_{j}^{n}\pi_{i}q_{ji}$
$\pi_{i} = \sum_{j}^{n}\pi_{i}q_{ji}$
$\pi_{i} = (\mathbf{\pi}Q)_i$
The above so-called detailed balance condition assumes the Markov chain is reversible, i.e., the probability flow from state $i$ to $j$ is the same as from $j$ to $i$.
However, a stationary distribution may exist and not be of the above form.