Neil Toledo - HW01 - Introduction to Machine Learning

Data Paritioning:

it’s common to have to partition available labeled data into your own “training” and “validation” data sets
for samller data sets, you can use K-Fold Cross-Validation, where you partition the training data into k disjoint sets, and then train a classifier k times using each of the k sets for validation, and the other k-1 sets for training.
to shuffle a data set, you can create a random permutation of indices, then rearrange the data set using the random permutation.

dataFile = io.loadmat("data_file.mat")
dataLength = len(dataFile["training_data"])
rand_idx = np.random.permutation(np.arange(dataLength))
samples, labels = dataFile["training_data"][rand_idx], dataFile["training_labels"][rand_idx]

#Partition Samples into Training and Validation
part = 500 #Number of Samples for Validation
validX, validY = samples[:part], labels[:part]
trainX, trainY = samples[part:part+size], labels[part:part+size]

in the future you can use sklearn’s train_test_split function btw xd

Training a Support Vector Machine (SVM)

SVM’s are supervised learning models typically used for classification and regression analysis. They’re “non-probabilistic binary linear classifiers”, and use labeled training data for supervised learning.
To create the model in python, we can use either sklearn’s SVC(kernel=’linear’) or LinearSVC. To train the model, use svm.fit(X, Y) where:

X: training data array w/ size [num_samples, n_features]
Y: array of class labels w/ size [num_samples]
note: the samples at X[index] belongs to the class Y[index]

svm = svm.SVC(kernel='linear')
svm.fit(trainX, trainY.ravel())
predictions = mnist_svm.predict(validX)
accuracy_score(validY, predictions)

#Training w/ K-Fold Validation
results, k = [], 5
for i in range(k):
    part_size = len(samples)//k
    part = i * part_size
    validX, validY = [samples[part:part+part_size], labels[part:part+part_size]]
    trainX = np.concatenate([samples[:part], samples[part+part_size:]])
    trainY = np.concatenate([labels[:part], labels[part+part_size:]])
    svm = svm.SVC(kernel='linear')
    svm.fit(trainX, trainY.ravel())
    predictions = svm.predict(validX)
    results.append(accuracy_score(validY, predictions))
return np.mean(spam_results)

Hyperparameter Tuning

Hyperparamters are higher level properties of training models that you can tune to influence the learning/training process. For example, regularization parameter C in soft-margin SVM algorithm “tells the SVM optimization how much you want to avoid misclassifying each training example. For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points. For very tiny values of C, you should get misclassified examples, often even if your training data is linearly separable. (Source)”
Other examples include the number of clusters and iterations in K-Means Clustering.

Histogram of Oriented Gradients (HOG)

HOG is a feature descripter used for edge and object detection in computer vision/object detection. The algorithm looks at individual regions of an image and calculates the gradient in the region. The gradient vector of a region corresponds to the change of color/intensity within the region and the direction the change occurs. The basic theory is that strong directional gradients represent the edges of an object, and can be used as a feature for classification/training.
The basic implementation observes center data points of a (usually 3x3 but can be larger depending on the situation) region, calculates the dX and dY values around the center, and uses trignometry to calculate the gradient vector from there.

Scale-Invariant Feature Transform (SIFT)

SIFT is a feature detection algorithm (not feature descriptor) used to detect specific features in an image. The core idea is to be able to find/locate features in an image using features from training images, but under different orientations, scales, and positions. Features extracted from training images are generally respresented as points, with relative positions to each other. The SIFT algorithm generally consists of the following steps:

Scale-Space Extrema Detection: Observing an image using different Gaussian filters, and finding keypoints of the image using max’s and min’s of the “Difference of Gaussians” (will look into and learn what exactly gaussians mean xd)
Keypoint Localization: Discarding unstable keypoints, which have ‘low contrast’ and are extremas to fitted edge lines from the keypoints.
Orientation Assignment: ‘Keypoints are assigned one or more orientations based on local image gradiant directions’
Keypoint Descriptor: ‘Create more descriptor vectors for each keypoint to make feature descriptors more highly distinctive.’ Describe and assign values to each keypoint in a more relative way to the actual feature.
(Source)