Getting through the coursework was a challenge – my computers have never worked so hard.
The last section involved performing a computation over a data set that took a few seconds per run to exhaustively search for the optimal settings for two parameters in the computation’s algorithm. Searching over 25 possible settings doesn’t sound like a lot, but two of ’em means 625 runs – times a few seconds is quite a wait.
Oh, wait – there was also a requirement to randomly shuffle the input data for the algorithm ten times and do some simple stats, to give some chance of reducing potential bias brought about by the order in which the data appears in the data set. So that’d be 10 runs per pair of parameter settings, which is 6250 runs. Or several hours with a CPU core maxed out and a nice, toasty computer.
But hey. I got some neat 3-d mesh plots out of it, showing the performance of the algorithm over the parameter search space. Proper science, this! Sure it has faults, but Matlab’s plotting functionality is pretty neat and easy to use. Plots like the following are a doddle:
Figure 1. Gratuitous use of a 3D plot for fun and profit
The goal of the exercise was to identify the most relevant ‘features’ in the data set for assigning the data into an appropriate class. Imagine you had a big table of information about a set of people, where the first column (could call it ‘Feature 1’) was their heights, the second was the time of day the height was measured, and you were trying to predict their sex. You and I would guess that height would be a good indicator of sex and the time of day would be irrelevant, but we’d be cheating by applying information about the meaning of the data that exists in our heads and wasn’t supplied with the problem.
By applying a variety methods to our table of data, a computer might be able to recognise relationships between the features and what we’re trying to predict, without knowing anything else about the information. In doing, it could remove the features that do not appear to have any relationship and save the computational effort and time that would otherwise be spent processing useless data. The approaches that can be applied are various, and some degree of tuning needs to be applied to to ensure that removing features doesn’t compromise the goal in subtle ways.
Today’s lectures moved on to machine learning techniques using the perplexing mathematics of probability (perplexing for my tiny brain, at any rate), in preparation for the last two weeks where unsupervised learning is the order of the day. The usual lab afternoon was focussed on kicking off a three week project involving applying the techniques we’re learning to do something with bunch of data in the style of a research paper.
Time to polish up the LaTeX from last year then…