Day 47 in MIT Sloan Fellows Class 2023, Advanced Data Analytics and Machine Learning in Finance 1, Fundamental ML
Categories of ML
- Supervised vs Unsupervised
- Model based vs instance based
- Online vs Offline
Supervised vs Unsupervised
Supervised ML
Learning a mapping of inputs to outputs. We have examples of both and we find algorithms that can lean this mapping from examples.
Examples:
- Linear regression
- Logistic regression
- Decision trees
- Boosted trees (are awesome)
- Support vector machines
Unsupervised ML
Look at data and find patterns. Most of the world's data do not have associated labels.
Examples:
- clustering (e.g. kmeans)
- Principle component analysis and singular value decomposition
- Manifold learning
Semi-supervised ML
Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). It is a special instance of weak supervision.
Examples
- GPT-3
- BERT
Model-based vs Instance based
Model based: impose a model on the world
- model size is independent of data size
- may extrapolate better out of domain
- potential for overfitting
Instance base: learn similarities and compare to known instances
- lower potential for overfitting
- may not extrapolate as well
-
space grows with amount of data
Point: Does the hypothesis space grow with the number of training examples?
Online vs Offline
Online can continue updating a model with every piece of new data.
Offline should make learning at only once.