Top 5 Cool Machine Learning Links

I’ve seen so much awesome stuff in my forays into Machine Learning as part of the course I’m doing, I thought I’d present for your entertainment and information my top 5 machine learning resources. (Kayla Harris suggested this infographic if you’re looking for a quick introduction to how ML is used in industry).

No, come back – some of this stuff is actually quite cool, I promise!

Here goes, in no particular order:

How to make $1M with Machine Learning and Netflix

Netfix offered a $1M prize for a team that could best their video classification technology by a significant margin.
The netflixprize page tells the official story, and the presentation attached to this blog post is well worth a look.

Detexify – Handwritten symbol recognition

For those of you that use LaTeX, you’ll know the pain of trying to find the code for a symbol if you can’t remember it. Detexify lets you draw a symbol on the screen, then uses machine learning techniques to recognise it and tell you which codes it thinks you need. The accuracy is astonishing – a really good showcase for the potential of the techniques.

Detexify in action

Detexify in action

Lego Mindstorms Robots that Learn

This JavaWorld article takes Lego Mindstorms and adds a pinch of Machine Learning to make a robot that learns to follow a path on the ground.

I highly recommend this article for a casual read, it’s very nicely written and accessible but does delve into the theory and mathematical foundations of the Perceptron algorithm at the heart of the article.

Machine Learning at videolectures.net

There are 794 presentations and lectures – that’s not a typo, seven hundred and ninety-four – on every aspect of machine learning you could dream of here, at videolectures.net, from a range of sources. Many are quite approachable for the layperson.

The Singularity Summit

To wrap up, the Singularity Summit seems to be the forum for the players in the general Artificial Intelligence arena to talk about the past, future and philosophical implications of AI.

The Conversations Network hosts a free podcast series for the summit – personally, I really enjoyed James Hughes’ twenty-odd minute talk, in which he answers one of  the great unanswered questions – if you’re standing on a railway bridge, are you safer stood next to an artificial intelligence or a human being?

That’s All Folks

I hope there’s something in there that’s given you some food for thought. If you have any stuff that you think is awesomely cool in this space, drop me a comment so I can check it out!

Why Java?

I was recently asked, why Java? It’s a great question – exactly why do I choose to learn the Java language?

I gave it some thought and I’ll share what I considered. I’m certainly not saying that what follows is a justification of the Java language in any general context, nor am I any kind of expert. This is just my view, given my circumstances.

It’s Free!

If you want to learn a language you don’t want to be laying out pots of cash up front. It’s likely you’ve got a Java Virtual Machine on your computer right now, and obtaining a development kit to get you started writing software is free and straightforward. Win.

The Java Virtual Machine

You never know when you might need to run that little app you wrote on a Windows host, or a different flavour of UNIX, or even a big ol’ IBM mainframe system – it’s nice to have some confidence that it’s just going to work.

Now, the JVM runs on any platform I can think of. Perhaps in the early days, there was some divergence in different platform implementations of Java (having felt the pain of the slight non-standard nature of the Micro$oft JVM in IE some time ago!) which somewhat hampered the cross-platform claim, these days there’s a suite of compliance tests that a JVM implementation must pass to brand itself Java compatible. That means that I don’t have to worry too much about cross-platform compatibility.

The JVM supports more than just the Java language. Jython, JRuby, Scala, JavaFX, Groovy, Fortress, Clojure… the list is getting ever longer, and there’s even a JVM Languages Summit. So, if you’re using the Java language and find a problem that better solved in another language, it’s perhaps not such a huge leap to get your Java code and your new JRuby code working together. Tim Bray wrote up some nice notes on the language summit here if you’re looking for a little more.

The other trick about the JVM is that your code gets better without you changing it. As new JVMs are released, they include the latest, hottest optimizations that take the software you write and give it go-faster stripes.

Tools and Platforms

Tool support for the Java language is extensive enough that there is generally a choice – for example, there is a choice of many Integrated Development Environments (Eclipse, NetBeans, IntelliJ IDEA to name but three) in which to write your software.

If you’re building ‘Enterprise’ software (whatever that really means!) you have a choice of application servers (WebSphere, JBoss, Jetty, Tomcat, GlassFish…) that all implement the Java Enterprise Edition specification in whole or in part (for example, the open source Tomcat application server supports a stripped-down subset of the spec) you know that any of the application servers should run your software.

With all these things the choices mean that I can choose the implementation with the right strengths and at the right price point for the project in hand.

Learning

Java’s got a lot going for it as a learning language – it seems to be on the syllabus for most computer science degrees. There’s a wealth of online material available for free, including Sun’s own learning trails and Sang Shin’s excellent javapassion.com learning site.

I know that the question of value in professional certification seems to polarize opinion, but there certainly is a certification trail in the Java language that’s not trivial to achieve.

The cost of achieving these certifications is financially quite insignificant, but (certainly for me – I’m sure there’s folks that find this stuff easier) demands a significant amount of time and commitment. The objectives and exam questions are put together by teams of Sun engineers, Java developers and Java instructors, orchestrated by Sun. (Thanks to Bert Bates and javaranch.com for that info!)

Having done a couple of certs myself, I found them to be a very useful useful guided tour of the language and its extensions. I found plenty of useful features and techniques whilst studying that have since steered me clear of errors and wasted time.

The JCP and Standards

New APIs for the Java language happen through the JCP, or Java Community Process. Everyone from from individuals to the largest IT players (IBM, Cisco Systems, Nokia – the members are listed here) are involved, and new standards happen in a publicly visible process of proposal and review.

Don’t believe me? Here’s the latest Java Enterprise Edition spec. See those JSR numbers? They’re the specifications that have been developed, reviewed and approved as part of the JCP.

Libraries

There’s sometimes so much choice of open source Java libraries, it’s hard to know what to choose for a given problem. The Apache Software Foundation hosts loads of open source Java projects, as does Google Code and java.net.

Not that every library is a piece of awesome… but many are. Having lots of choice increases the chances there’ll be something out there that’s already been built and tested and fits the bill. The libraries and APIs I have to use with my language of choice to get things done make a big impact to my productivity.

What – No Discussion of Technical Stuff?

Nope. I reckon it’s rather pointless to try and discuss differences between the capabilities of one language versus another. Anything computable can be computed in any Turing-complete language, which I think covers any language you might seriously approach as a general purpose problem solving platform.

So it’s not about what can or can’t be done – it’s really about what language I can be most effective and productive in. The biggest productivity killer in Java seems to be the boilerplate code necessary to do simple things, but over the last couple of years, strides forward have been made with annotations, for example, to reduce the boilerplate problem.

That’s All Folks

That’s about it, really. There’s plenty of other languages that can lay claim to some of the points I’ve made, but few that can claim them all.

There are other languages I choose for specific jobs (JavaScript for client-side behaviour in web browsers, PERL for sysadmin-type scripts) but my day to day workhorse, and the focus of my learning attention, is Java. For now, at least.

If I’ve missed anything or you disagree, feel free to drop me a comment!

Machine Learning – After the Project

My project for machine learning got handed in on time. It took hours to strip it down from the 8 pages I had when I’d finished to the 6 pages the spec asked for. Careful stripping out of any unnecessary waffle and merging of plots and charts was the order of the day.

I ended up scaling back my plans to explore text mining or ensemble learning to a simple comparison of some of the learning algorithms we learnt about on the course, with some exploration of slightly more advanced statistical comparison methods than we covered. The thinking was that it’d be better to try and demonstrate sound understanding of the basic algorithms and the experimental method – time will tell whether that was the right call.

Unlike how I used to do this kind of work when I was an undergraduate (start with a couple of days to go to deadline), this time I used most of the three weeks allowed to explore the options, work on the software, gather results and produce the paper. Hopefully, the work will show in the result, but I suspect it’s more a case of getting the approach right to minimise the time taken figuring out what to do.

But I guess it’s another example of work expanding to fill the time allowed!