足ることを知らず

Data Science, global business, management and MBA

Day 47 in MIT Sloan Fellows Class 2023, Advanced Data Analytics and Machine Learning in Finance 1, Fundamental ML

Categories of  ML 

  • Supervised vs Unsupervised
  • Model based vs instance based
  • Online vs Offline

 

Supervised vs Unsupervised

Supervised ML
Learning a mapping of inputs to outputs. We have examples of both and we find algorithms that can lean this mapping from examples.

Examples:

  • Linear regression
  • Logistic regression
  • Decision trees
  • Boosted trees (are awesome)
  • Support vector machines

 

 

Unsupervised ML
Look at data and find patterns. Most of the world's data do not have associated labels.

Examples:

  • clustering (e.g. kmeans)
  • Principle component analysis and singular value decomposition
  • Manifold learning

Semi-supervised ML

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). It is a special instance of weak supervision.

Examples

  • GPT-3
  • BERT

Model-based vs Instance based

Model based: impose a model on the world

  • model size is independent of data size
  • may extrapolate better out of domain
  • potential for overfitting

 

Instance base: learn similarities and compare to known instances

  • lower potential for overfitting
  • may not extrapolate as well
  • space grows with amount of data

Point: Does the hypothesis space grow with the number of training examples?

Online vs Offline

Online can continue updating a model with every piece of new data. 

Offline should make learning at only once.