Decision Theory
This page is still incomplete
Decision Theory (Risk Minimization)
For a given feature, multiple sample points with the same value can belong to different classes. For example, among people that eat 1200 Calories a day, there can exist people that have cancer and people that don’t have cancer.
Recall that:
P(X) = P(X|Y = 1) P(Y = 1) + P(X|Y = 0) P(Y = 0)
We use Bayes theorem to determine the probability of an event given information:
P(Y = 1|X) = P(X|Y = 1)P(Y = 1) / P(X)
A loss function L(x,y) specifies the magnitude of a penalty if classifier predicts z, where the true class is y. For example, a function L(x,y) could output 1 for false positives, 5 for false negatvies, and 0 for when z == y. A loss function is asymmetrical when not all false classifications aren’t weighed the same. A 0 - 1 loss function simply outputs 1 for all incorrect positions, 0 for correct.
-bayes decision rule is using risk to pick the optimal classifier -also ‘bayes classifier’: r* that minimizes function R(r) -If L is 0-1 loss: R(r) = P(r(x) is wrong)
3 Ways to Build Classifiers
- Generative Models (e.g., LDA)
- assume sample points come from probability distributions for each class
- guess the form of the distributions
- for each class C, fit distribution parameters to class C points, giving P(X | Y = c)
- for each C, estimate P(Y = c)
- Bayes’ Theorem gives P(Y | X)
- If 0-1 loss, pikc class C that maximizes P(Y = C | X = x); or maximizes P(X = x | Y = c) P(Y = c) Pros: tells you probability guess is wrong; can diagnose outliers Cons: hard to estimate distributions accurately; real distributions rarely matches standard ones
- Discriminative models (e.g. logistic regression)
- Model P(Y | X) directly Pros: tells you probability guess is wron;
- Find Decision Boundary (e.g. SVM)
- Model r(x) directly (no posterior)
Gaussian Function:
A Gaussian is a function of the form: \(f(x) = ae^{ {x - b}^2 \over {2c^2} }\) where a is the height of the peak, b the position, and c the width of the curve.