Pattern-Based Software Dev – Day 1

I got a couple of great surprises this morning on turning up in Manchester for the module starting today.

First up, the lectures were originally timetabled for a 9:30am start, and are now timetabled for 11:00am. That gives me loads of time between arriving in Manchester at 08:00 and starting lectures to eat, get to the library, do any admin stuff that’s easier when I’m onsite and generally chill out before getting started.

Second – I signed up for ‘IBM Patterns for e-Business Applications’, because I wanted to get some Software Engineering coverage as part of my MSc, and there was some coverage of design patterns in the syllabus for this module in 2009. I was in two minds about it, studying something with ‘IBM’ on it didn’t seem entirely right for an academic course.

To my surprise, the course has been re-branded ‘Pattern-Based Software Development’ overnight, and a complete re-write of the lectures has started to appear that appears to focus on understanding and applying some of the GoF design patterns – pretty much the exact course I wanted to take. I’ve studied and applied some of the GoF patterns before, and I’m really looking forward to learning the syllabus and having my work critically reviewed.

As an aside, it looks like the Manchester CS department is completely re-working its taught MSc Advanced Computer Science proposition, organising the taught modules into ‘pathways’ like Artificial Intelligence and Natural Language Processing. Looks like a good move to me, helpful for students choosing modules.

The lecture material introduced the Strategy, State, Proxy and Item Description patterns. The first three are pretty well known, but it’s the first time I’ve come across the last one.

Coursework material involves UML Class diagrams and designing a system to solve a loosely defined business problem. Unfortunately, it seems that good UML tools are tough to find. After a few days of battling working with the Eclipse project’s UML2 plugin I’ve come to the conclusion that I don’t much like it for simple diagramming. I’ve tried a few other tools with limited success, just a couple left to try. It might be you do have to pay $$$ to get a good one – but we’ll see.

Finishing up the Machine Learning Module

Well, the Machine Learning exam was this morning… another 5:30 am start to get to Manchester in plenty of time.

My Top Tip for distance learning today has to be: if you have to attend classes, labs, exams – you know, stuff that you can’t really afford to miss, aim to be there an hour early.

Today, I didn’t realise that the exam wasn’t in the same building that I’ve had every lecture, lab and exam so far. In fact, it was on the other side of the campus, and it’s not a small University. I was very glad of having 45 minutes from checking my information to the exam starting! Totally my own fault, of course – focussed on studying and the date and time of the exam, I made an assumption – but these things happen. If you’re there early and everything works out fine you have time to relax and centre yourself. On the other hand, if there is a problem, you’ll be very glad of that time.

The course itself was a fascinating introduction to several aspects of automated learning. Starting out with linear and nonlinear classifiers, moving on to decision trees, then probabilistic classifiers, unsupervised learning and finally sequence learning, we covered a large set of knowledge with significant maths pre-requisites.

Most of the material was quite approachable (now that I’m largely over my irrational fear of mathematical symbols – I wonder if there’s an official phobia for that?), with the notable exception of the probabilistic stuff. I’m not sure why I had such a problem with it and even after some serious digging in books I’m still not totally clear on some of it. More work needed there in the future, I fear.

Funny thing about the maths stuff – it has taken/is taking me a lot of effort to penetrate the notation. Once I can read it, though, the concept hiding underneath tends to be fairly intuitive. Go figure.

So how did the exam go? As with the last one, I can easily imagine how it might have been much tougher. Feels like it went OK, but you never know do you?

Anyway, now the immediate study pressure is off for a few weeks I’m hoping to catch up on some reading (right now, a quarter of the way through Code Complete 2, by Steve McConnell – I’d like to finish that off) and get a few more blog posts in.

Top 5 Cool Machine Learning Links

I’ve seen so much awesome stuff in my forays into Machine Learning as part of the course I’m doing, I thought I’d present for your entertainment and information my top 5 machine learning resources. (Kayla Harris suggested this infographic if you’re looking for a quick introduction to how ML is used in industry).

No, come back – some of this stuff is actually quite cool, I promise!

Here goes, in no particular order:

How to make $1M with Machine Learning and Netflix

Netfix offered a $1M prize for a team that could best their video classification technology by a significant margin.
The netflixprize page tells the official story, and the presentation attached to this blog post is well worth a look.

Detexify – Handwritten symbol recognition

For those of you that use LaTeX, you’ll know the pain of trying to find the code for a symbol if you can’t remember it. Detexify lets you draw a symbol on the screen, then uses machine learning techniques to recognise it and tell you which codes it thinks you need. The accuracy is astonishing – a really good showcase for the potential of the techniques.

Detexify in action
Detexify in action

Lego Mindstorms Robots that Learn

This JavaWorld article takes Lego Mindstorms and adds a pinch of Machine Learning to make a robot that learns to follow a path on the ground.

I highly recommend this article for a casual read, it’s very nicely written and accessible but does delve into the theory and mathematical foundations of the Perceptron algorithm at the heart of the article.

Machine Learning at videolectures.net

There are 794 presentations and lectures – that’s not a typo, seven hundred and ninety-four – on every aspect of machine learning you could dream of here, at videolectures.net, from a range of sources. Many are quite approachable for the layperson.

The Singularity Summit

To wrap up, the Singularity Summit seems to be the forum for the players in the general Artificial Intelligence arena to talk about the past, future and philosophical implications of AI.

The Conversations Network hosts a free podcast series for the summit – personally, I really enjoyed James Hughes’ twenty-odd minute talk, in which he answers one of  the great unanswered questions – if you’re standing on a railway bridge, are you safer stood next to an artificial intelligence or a human being?

That’s All Folks

I hope there’s something in there that’s given you some food for thought. If you have any stuff that you think is awesomely cool in this space, drop me a comment so I can check it out!

Machine Learning – Day 5

So that’s the end of the taught course in Machine Learning, finishing up learning about Markov Chains and Hidden Markov Models.

Yep, those are just links to the Wikipedia articles, and it’s quite possible that if you clicked on them and you’re anything like me, the crazy-looking maths stuff at the other end made the sensible part of your brain run off and sit in a corner, clutching its knees and rocking gently back and forth.

Probably muttering to itself.

To be honest, I can’t really explain what this stuff is about just yet – I’ve had a lot crammed into my head over the past few weeks, and I think I need a really good night’s sleep before I can comprehend the deeper aspects of this last bit. Suffice to say for now that it seems like some really interesting and powerful ideas are in play, and when I’ve got my head round it I’ll blog up my thoughts.

I’ve now got one more homework assignment on today’s material to complete by next Wednesday, and the project we’ve been assigned to do is then due on Friday 6th November – a nice surprise, as a typo on the schedule had us believe it was due on the previous Tuesday.

I’m sorry the taught part of the course is done, to be honest. Although I’m not sure I could have taken any more at the pace it was being taught, I’ve thoroughly enjoyed the material.

In fact, I’d say I feel a little inspired.

And, as James Brown might say – it feels good.

Machine Learning – Day 3

Getting through the coursework was a challenge – my computers have never worked so hard.

The last section involved performing a computation over a data set that took a few seconds per run to exhaustively search for the optimal settings for two parameters in the computation’s algorithm. Searching over 25 possible settings doesn’t sound like a lot, but two of ’em means 625 runs – times a few seconds is quite a wait.

Oh, wait – there was also a requirement to randomly shuffle the input data for the algorithm ten times and do some simple stats, to give some chance of reducing potential bias brought about by the order in which the data appears in the data set. So that’d be 10 runs per pair of parameter settings, which is 6250 runs. Or several hours with a CPU core maxed out and a nice, toasty computer.

But hey. I got some neat 3-d mesh plots out of it, showing the performance of the algorithm over the parameter search space. Proper science, this! Sure it has faults, but Matlab’s plotting functionality is pretty neat and easy to use. Plots like the following are a doddle:

Matlab 3D Plot

Figure 1. Gratuitous use of a 3D plot for fun and profit

The goal of the exercise was to identify the most relevant ‘features’ in the data set for assigning the data into an appropriate class. Imagine you had a big table of information about a set of people, where the first column (could call it ‘Feature 1’)  was their heights, the second was the time of day the height was measured, and you were trying to predict their sex. You and I would guess that height would be a good indicator of sex and the time of day would be irrelevant, but we’d be cheating by applying information about the meaning of the data that exists in our heads and wasn’t supplied with the problem.

By applying a variety methods to our table of data, a computer might be able to recognise relationships between the features and what we’re trying to predict, without knowing anything else about the information. In doing, it could remove the features that do not appear to have any relationship and save the computational effort and time that would otherwise be spent processing useless data. The approaches that can be applied are various, and some degree of tuning needs to be applied to to ensure that removing features doesn’t compromise the goal in subtle ways.

Today’s lectures moved on to machine learning techniques using the perplexing mathematics of probability (perplexing for my tiny brain, at any rate), in preparation for the last two weeks where unsupervised learning is the order of the day. The usual lab afternoon was focussed on kicking off a three week project involving applying the techniques we’re learning to do something with bunch of data in the style of a research paper.

Time to polish up the LaTeX from last year then…

Semi-Structured Data and the Web – Day 5.5

Two weeks later, on deadline day…

I think I defeated the XQuery assignment. It took the best part of a week, guessing at around 18 hours,  but my 485 lines of code handle everything I can think of that was within the spec of the assignment. It was loads of fun handling transformations from various combinations of sequences, choices, star-expressions, minOccurs and maxOccurs in an element to a XML description of regular expressions, as well as implementing a combinations generator and a function to filter out unique elements in a sequence of elements.

I’m sure all that stuff with a functional lean is pretty old hat to someone who’s a Python, Erlang, F# or Haskell guru, but for little ol’ me with my OO background, it was pretty painful to swap my thinking over to immutable variables (which I’m sure is a contradiction in terms!), a complete inability to track state and just plain indulgence in recursion.

In short, it was brilliant. Exactly what I’m here for, to get out of my ‘comfort zone’ and learn new ways of thinking. Continue reading “Semi-Structured Data and the Web – Day 5.5”

Semi-Structured Data and the Web – Day 5

It’s been an enlightening week on the homework front.

Having had some experience with XML before, I know how easy it is to mess up writing XML, particularly if you’re doing it by hand. Nesting wrong here, a tag misspelled there… although XML is, technically speaking, ‘human readable’, it’s not exactly human-friendly. It’s extremely precise, tends to be very verbose, and has newlines, tabs and other whitespace mixed in which tickles the ol’ natural human intuition about structure but is structurally meaningless to the machine. (Google ‘xml human readable’ for loads of articles on the subject.) Continue reading “Semi-Structured Data and the Web – Day 5”

Semi-Structured Data and the Web – Day 4

Homework this work was rather tricky – transforming XMLSchema into a tree grammar representation using XQuery. Sounds simple enough, but I now feel a certain revulsion, maybe even extending as far as hatred, towards XQuery. To be honest, I think it’s got a lot to do with the fact that XQuery is a functional language, and I’m new to the whole functional thing. It feels a little like programming by explosion… Towards the deadline, I think I was starting to get the hang of it. You just have to think a little… differently. Unfortunately, it took too long for me to figure this out and get the homework assignment done, so we had to hand in a partially complete assignment. Not liking that much.

On the menu today: More tree grammar stuff, including algorithms to validate an instance document against a grammar, Schematron (a rule-based document validation language)  and XSugar – all of which have more homework assignments set. No time for more blogging, too much work to do!

Semi-Structured Data and the Web – Day 3

Woah.

It always worries me a little when the greek symbols come out. So far, we’ve pretty much avoided them in the Semi-Structured Data and the Web course, so to see them today, whilst not really unexpected, did make my heart sink a little. Continue reading “Semi-Structured Data and the Web – Day 3”

Semi-Structured Data and the Web – Day 2

So, after little snow towards the end of the week, it pelted down on Sunday night, leaving the pavements and roads like an ice rink at 6am – I only narrowly avoided sliding down the hill to the tram stop on my backside. It’s at times like this I’m very glad of my thermos mug of tea!

Scheduling around last week’s coursework didn’t go perfectly – we ended up working til late on the Sunday night to finish off the last bits. Unfortunately, I wasn’t able to meet up with my groupmate last week as it was quite late in the day before we actually sorted out the group assignments. It would have been handy to meet up and get a plan of attack together for the week’s coursework, and we were – perhaps – a little uncoordinated. That said, we just about made it, and everything got in on time. 08:55, for a 9am deadline kind of on-time, that is. Continue reading “Semi-Structured Data and the Web – Day 2”