Machine Learning – Day 5

So that’s the end of the taught course in Machine Learning, finishing up learning about Markov Chains and Hidden Markov Models.

Yep, those are just links to the Wikipedia articles, and it’s quite possible that if you clicked on them and you’re anything like me, the crazy-looking maths stuff at the other end made the sensible part of your brain run off and sit in a corner, clutching its knees and rocking gently back and forth.

Probably muttering to itself.

To be honest, I can’t really explain what this stuff is about just yet – I’ve had a lot crammed into my head over the past few weeks, and I think I need a really good night’s sleep before I can comprehend the deeper aspects of this last bit. Suffice to say for now that it seems like some really interesting and powerful ideas are in play, and when I’ve got my head round it I’ll blog up my thoughts.

I’ve now got one more homework assignment on today’s material to complete by next Wednesday, and the project we’ve been assigned to do is then due on Friday 6th November – a nice surprise, as a typo on the schedule had us believe it was due on the previous Tuesday.

I’m sorry the taught part of the course is done, to be honest. Although I’m not sure I could have taken any more at the pace it was being taught, I’ve thoroughly enjoyed the material.

In fact, I’d say I feel a little inspired.

And, as James Brown might say – it feels good.

Machine Learning – Day 4

Day 4 covered methods of automatically identifying clusters in data – and some of the issues that arise using those techniques.

Doing this automatic identification is called¬†unsupervised learning, because it doesn’t depend on having a set of labelled data examples to hand. The learning is done purely based on the statistical and probabilistic properties of the data itself.

I got to say, I’m struggling with the probablistic side of things – my intuition isn’t helping me much, so I’ve been doing the books thing to try and really get my head round it. So little time…

We also covered some techniques involving reducing the dimensionality of data – say a dataset has a thousand properties, and the computational overhead of processing increases with the number of properties. You’ll need some way of reducing the number of properties, whilst retaining the maximal information they encoded. We were looking at selecting features a couple of weeks ago, but today we looked at PCA – Principle Component Analysis, a technique to ‘project’ information into a smaller number of dimensions, or properties.

I quite like this paper on PCA, if you’re looking for somewhere to get an idea what it’s about.

And that, if you were reading this blog a few weeks ago, is where the eigenvalues and eigenvectors come in.

We also have a project to complete in the next couple of weeks, so time is very much of the essence right now. I suspect that as with the Semi-Structured Data and the Web course last year, the deeper concepts behind some of this material will only become clear to me when I’ve completed the set material and start to work on the revision of what we’ve covered.

Back in the day, revision was time off between lessons and exams – these days, not so much!

Machine Learning – Day 3

Getting through the coursework was a challenge – my computers have never worked so hard.

The last section involved performing a computation over a data set that took a few seconds per run to exhaustively search for the optimal settings for two parameters in the computation’s algorithm. Searching over 25 possible settings doesn’t sound like a lot, but two of ’em means 625 runs – times a few seconds is quite a wait.

Oh, wait – there was also a requirement to randomly shuffle the input data for the algorithm ten times and do some simple stats, to give some chance of reducing potential bias brought about by the order in which the data appears in the data set. So that’d be 10 runs per pair of parameter settings, which is 6250 runs. Or several hours with a CPU core maxed out and a nice, toasty computer.

But hey. I got some neat 3-d mesh plots out of it, showing the performance of the algorithm over the parameter search space. Proper science, this! Sure it has faults, but Matlab’s plotting functionality is pretty neat and easy to use. Plots like the following are a doddle:

Matlab 3D Plot

Figure 1. Gratuitous use of a 3D plot for fun and profit

The goal of the exercise was to identify the most relevant ‘features’ in the data set for assigning the data into an appropriate class. Imagine you had a big table of information about a set of people, where the first column (could call it ‘Feature 1’)¬† was their heights, the second was the time of day the height was measured, and you were trying to predict their sex. You and I would guess that height would be a good indicator of sex and the time of day would be irrelevant, but we’d be cheating by applying information about the meaning of the data that exists in our heads and wasn’t supplied with the problem.

By applying a variety methods to our table of data, a computer might be able to recognise relationships between the features and what we’re trying to predict, without knowing anything else about the information. In doing, it could remove the features that do not appear to have any relationship and save the computational effort and time that would otherwise be spent processing useless data. The approaches that can be applied are various, and some degree of tuning needs to be applied to to ensure that removing features doesn’t compromise the goal in subtle ways.

Today’s lectures moved on to machine learning techniques using the perplexing mathematics of probability (perplexing for my tiny brain, at any rate), in preparation for the last two weeks where unsupervised learning is the order of the day. The usual lab afternoon was focussed on kicking off a three week project involving applying the techniques we’re learning to do something with bunch of data in the style of a research paper.

Time to polish up the LaTeX from last year then…

Machine Learning – Day 2

Day 2 of the Machine Learning MSc module at Manchester saw us learning about Decision Trees and the role that entropy, linear correlation and mutual information can play.

It’s all about categorical data (like name, a set of fixed values), whereas last week was about the automated classification of continuous data (like temperature, a smooth range of values). The algorithms we were looking at to automatically build decision trees using the inherent statistical and probabilistic properties of a set of data to try and maximise the decision accuracy with the minimum overhead of computation and memory.

Today’s stuff didn’t seem too tricky, and last week’s lab assessment went pretty well.

This week, we need to use the mi() and h() fuctions from the a Matlab Mutual Information library here. Sounds great, but – I’m getting problems using it referring to undefined symbols that may be related to the 64-bit OS on this machine, so I’ll need to try a few options to work around that. Need to get that working!

Well, it’s been a long day so I’ll call a close here. Cheers!

Machine Learning – Day 1

So I made it in on time for the first day of my Machine Learning course. The train was fantastic, particularly in comparison to the tiny cattle carriage that I ended up last Wednesday. Tip of the day – even on the same routes, not all trains are equal!

After the usual stop at the butty shop for a sausage and egg sandwich plus a coffee, I was in room 2.15 and ready for action.

So what’s Machine Learning then? Sounds very Skynet and The Matrix, doesn’t it? Dr. Gavin Brown started out explaining how ML is a subset of the field of Artificial Intelligence, which focuses on software that can analyse and adapt to the underlying patterns in the information it sees.

Other areas under the AI banner include reasoning, robotics and vision, to name but a few. This breakdown of the big, amorphous ‘thinking machines’ field as it was in the 60s into these sub-areas is why we have made huge leaps forward in the field since the past couple of decades.

What progress? Today, Machine Learning technology finds use everywhere – some examples are the Amazon online store (selecting your recommended stuff), tuning computer games, filtering spam emails and fraud detection in banking. If you’d like to know more about the motivation behind studying this stuff, you can check out these introductory slides.

The format for this module is very different to the Semi-Structured Data and the Web module. It’s still every Tuesday for five weeks, but there are no full days of lectures. Instead , the mornings are lectures and the afternoons are lab sessions.

Assessment is also different – there’s still an exam, but the coursework consists of assessed verbal lab reports for 20% and a project for 30%. The exam counts 50%. Whereas in the last module, we were assigned to groups of two and much of the coursework was largely joint in nature, this time it’s all individual work.

The labs use a matrix-based programming language called Matlab. Takes a bit of getting used to, but usable enough when you start to get the hang of it.

Day 1 covered terminology, the ‘Perceptron’ algorithm (will find a dividing line between two classes of data, if one exists) and Support Vector Machines (tries to find the ‘best’ such line, using hairy maths). If you’re interested in knowing more, Googling for ‘Machine Learning’ and these terms will find papers, communities and video talks and lectures. It looks like a really active area of research.

I get the feeling the focus is to be on understanding what’s going on more than any implementation details. That’s a good and a bad thing for me – I know implementation, and you can largely tell if you’ve got an implementation functionally correct by whether it does what it’s supposed to do.

This time it might be a bit less clear cut whether I’m right or wrong before I get to the assessment phase!