Planning my Project

It’s been a bit quiet on crossedstreams.com for the past month or so. Between lots of great stuff going on at work keeping me very busy, some Stag Do related shenanigans and working on my project, here hasn’t been much time for blogging.
In order to complete my MSc, I need to complete a project and produce a dissertation. In addition there is a pre-requisite module that sets up the project, requiring the submission of a project statement, a project plan, a project website and a project background report. It’s these aspects I have been working on.
Additional complexity is introduced by my choice to prepare my own project involving what I do for a day job. This introduces certain additional hoops that need to be jumped through that happen to take a fair bit of time and effort, but wih any luck those hurdles are nearly cleared now and the actual work can kick off properly.

Ubuntu, Fedora or Mint?

About a month ago after I finished my last module, I upgraded to the latest Ubuntu release, 11.04 or ‘Natty Narwhal’. My first impressions¬† over the course of a week or two were sufficient to have me go looking elsewhere.

There were some big problems.

Ubuntu 11.04

The new Unity interface, whilst it’s very pretty, is totally unfamiliar and feels rather like a toy. The menus I used to start applications from are gone, the taskbar I used to see what was running and place shortcuts on is gone. Now to start a program there’s a glossy, full screen… thing… it’s a bit like a menu… but takes up the whole screen with big Fisher-Price icons. To see what’s running at a glance… I can’t. The idea where the title bar of a window with the window buttons and menus isn’t attached to the window and appears at the top of the screen… seriously? I hear that this idea is nicked from Apple – but it really doesn’t work for me.

I guess the idea is that you type the name of the application instead of finding it in the menus. Nicked from Windows 7, I think. If I want to find and launch applications by typing their names, I use the command line – I’m not sure I get how search instead of menus is a step forward.

Then there was the speed, or rather, the total lack thereof. Using my computer went from effortless to wading through treacle. In snowshoes. I notice performance tips and tweaks guides for 11.04 starting to appear out there, so it’s not just me. The poor performance was the dealbreaker.

Fedora 15

I downloaded Fedora 15, having previously been a user of that distro. I know that 15 ships with Gnome 3, but I didn’t realise it would be so similar to Unity, with all the same bizarre UI quirks. On the bright side, it was a lot snappier… but all in, still not really usable.

Mint

So yesterday, I pulled Linux Mint 11 off the shelf and I’m happy to say that it is a joy to use. Menus, task bars, windows that work properly, fast, easy to set up. Back to business as usual. If you’re not loving the Gnome 3/Unity thing, I can recommend Mint (so far, based on 24h usage… mileage may vary!)

Serious or Casual?

With my immediate problems addressed, the direction that Gnome and Unity are taking for Linux is interesting. Are we seeing the Linux windowing systems fragment into serious and casual usecases? I can see how the new UI might be familiar and easy for someone who is used to their tablet or their smartphone. Maybe it’s also good angle for relatively small screen devices like netbooks and tablets – certainly the apparent ‘every pixel is precious’ mindset doesn’t make much sense on a big widescreen monitor.

I expect that broadening the appeal of an operating system is a good thing, and perhaps Ubuntu and Fedora are setting their stalls out as ‘for the casual user’. If that’s so, then thank goodness for distros like Mint that give folks who use their computers to do work the power of old(er) school Linux without the pain.

Essays on the State of the Art and Future of Text Mining

The coursework for this Text Mining module has been quite challenging. Each week we had a task to complete, along the lines of evaluating training of a part-of-speech tagger (a piece of software that tries to tag words with the part of speech they serve), or create a named entity recogniser (a piece of software that tries to work out that some sequences of words have meaning above their component parts – for example “New York” means something different to “new” and “York”) using various methods. As I’ve worked through though, the goals have become clear – we were building up components that could work in sequence to process text. Neat.

One aspect of the coursework that was unusual was that it is all to be handed in together at the end, rather than week by week. If I’m honest it’d probably have been a little easier if I’d done the coursework in step with the lecture days – I actually fell a little behind because of various commitments.

Then there was the essay. A 3,000 word essay on the state of the art of text mining and my views for the future of the field.

I’ve not written an essay for at least 15 years now, and getting started was a real challenge. Text mining and Semantic Web maybe? Sentiment analysis is the future? I was pulling my hair out, trying to find an angle that I could argue cleanly though, citing academic research and the like. I’ve been screwing up outlines on bits of paper about a week now!

That said, when I headed into Manchester yesterday and sat in my lectures, I had something of an epiphany. I guess the problem was that I feel the field has huge untapped potential, and I struggle to argue through a point of view I care about when I can’t see the current approaches panning out. I’m going to take a bit of a risk, and write an essay that (constructively) criticises some aspects of text mining today, proposing and arguing through a slightly different approach.

We’ll see how it goes – the last few bits of paper have so far avoided a one-way ticket to the bin. Hopefully I can produce a well-argued, reasonably interesting essay that I’ll get some marks for!

Text Mining – Day 4

Between prep for my MSc. project, getting married, snowed under at work, starting the my next MSc. module and being full of cold, there hasn’t been much time for blogging…

So today was day 4 of the Text Mining module. As a friend put it, “Text Mining? What – like using grep?”

Text Mining is defined as finding previously unknown information in unstructured data. Unknown – as in never explicitly written down.

So by ‘text’, we mean un- or partially-structured data, like word documents or this blog page. There’s some structure here, headings, subheadings, lists and the like. but it’s not ‘structured’ in the sense that database tables are, with fields and columns and a type system.

Tools like grep can match words (more generally, expressions describing relatively simple patterns of characters called regular expressions), so whilst they’re fairly easy to use (so long as you don’t try to push them too far), they are limited in the complexity of what they can do.

For example, you can’t easily use grammatical ideas, like identifying documents that are about fish (a fish), but not fishing (I fish). You can’t search for documents related to a concept, and recognising generic names or technical terms is out. You can’t build structures like indices to help with searches, which means that over reasonably large collections of documents, grep is too slow to be very useful.

I’m still getting my head around how it hangs together, but text mining seems like a set of gloriously messy, pragmatic and seemingly pretty successful ways to let computers listen in on the languages that humans have evolved.

Logic and Applications – Tough Exam!

I took the Logic and Applications exam last Friday. I think I’m ready now to talk about the ordeal…

It wasn’t so bad really, I guess. I made a bad call as to which questions to answer (it was one of those answer three of four kind-of-things) and ran out of time. One of the questions I initially chose had what was for me a brick wall towards the midway point, and on a two hour exam, spending 20-25 mins heading down a dead end isn’t the best idea!

I guess the two frustrations I felt with this exam were firstly that the course covered so much material so quickly, but each of the topics turned out to be a bit of a rabbit-hole when I got to thinking about it during the revision process – the more I thought about it, the more questions I found!

On top of that, one of the key aspects of a course like this is transformation of formulae into alternative forms which have properties we want – usually, more efficient solving algorithms. These transformations are rather like the algebraic manipulation of mathematical formulae we did at school – progressing in unit steps, painstakingly copying out each new form as you go. That consumes a lot of time, especially when the formulae don’t give out easily, but it doesn’t really seem to prove much about the student’s skills – the pages-of-transforms kind of work was all hammered pretty hard in the coursework, after all. Then again, maybe I just screwed something up early doors and that led to the extensive transform.

The course was new this year anyway, so maybe it takes a little time for the exams to settle in terms of difficulty. Or I’m just a dumbass. Anyway, it’s too late to worry about all that now. Hopefully, I passed – that’s the main thing, right?

Why I didn’t write any software for Windows Mobile

A few year ago, around 2006 at a guess, I saved up a bit of my hard-earned dollar and bought a Dell Axim X51v. It was a wonderful little device for the time and I fancied having a go at writing software for it.

So I went to the Microsoft website to find out how to do that, where I was confronted with a request for more cash. In order to write a line of code for Windows Mobile at that time, you had to shell out for licenses to use Microsoft’s IDE and developer tools. That’s on top of whatever fees that MS was getting from Dell and the license I’d bought with the device to actually run Windows Mobile.

Naturally, I baulked at the idea and never gave it a go.

Nor have I bought anything from Microsoft since – although that wasn’t a conscious decision. It’s just that since then, there hasn’t been anything that wanted to do in terms of development that mandated some kind of payment. Case in point – my faithful little HTC Magic, succeeded by my Samsung Galaxy S mobile phones. These phones are thoroughly awesome bits of kit which run on Android technology, and recently I had my first dabble in Android development.

Of course, everything you need to write software for Android is freely available on the web, and you can expect a post of two about how that’s going.

Out of curiosity, I checked back in on Microsoft, and it sure looks like you can write for Windows Mobile these days for free. Would it still cost money to write for Windows Mobile if the competition wasn’t giving away their goodies for free? I also had a look at Apple’s tooling to build stuff for the iPhone but I couldn’t work out if it’s free right now or not. (I couldn’t be bothered to look for more than a minute or two to be honest – any readers know?)

I wonder if my decisions since then would have played out any differently if I’d been able to just download the stuff I’d needed to have a go back on ’06? Who knows, I might have gotten hooked on the Microsoft toolset like Visual Studio.

Preparing for the Logic and Applications Exam

It feels like a long time between finishing the Logic and Applications course back in early November and the exam, which is next week on the 27th January. In between, I’ve done a little work on my project proposal in the meantime, but certainly since late December I’ve been focussing more on preparing for the exam.

It’s always a bit surprising when I start revising how much stuff we covered in a five-week course and this one was no exception. The syllabus is here on the UoM CS website. It’s also a new course this year, so there aren’t any specific past papers (exam papers from previous years) to get a feel for how the exam will be phrased and what kind of content has been examined before.

The nearest course in previous years was the Automated Reasoning course, which covered similar stuff but also included some aspects of logic programming in Prolog. In this course we used theorem provers SPASS and MiniSAT for the small amount of experimental work involved. Hopefully there won’t be any ‘remember-the-syntax’ style questions…

Logic and Applications Coursework Catch-Up

The lecturers and demonstrators for the Logic and Applications course held a ‘coursework catch-up’ today, the idea being that we can see where we went wrong in the assessed coursework pieces we submitted.

I think a big thank-you is warranted to the demonstrators on the course, Adam and Mohammad. These guys have been really helpful over e-mail as I’ve been doing the coursework pieces at home. It’s a little tricky because they can’t answer questions that are too open, nor can they confirm whether an answer is right or wrong. What seems to have worked this time is asking whether a coursework answer suggests that I’ve understood a specific concept.

I could have understood it and still made some small error and so got the answer wrong, so there’s no big reveal about what the answer is. The couple of times where there was a problem with my understanding, the demonstrators could just suggest that I review the relevant concept. Lots of thanks to those guys anyway for their patience and help!

After reviewing my marked coursework, I found that where I’ve lost marks it’s because of nothing more serious than typos which is nice to know. Now the revision process starts and there’s plenty to chew on with this one!

Logic and Applications Day 5

So I’ve finished the Logic and Applications module now – the last coursework has been submitted and I’ve been enjoying a couple of days of doing things other than schoolwork!

The course was very focused on satisfiability – given a set of logical statements, is it possible to find an interpretation, a set of assignments for the logical statements, that satisfies them? That might not sound particularly useful, but it’s easy to express a question about a system this way.

As an example, consider the Minesweeper game. (Sorry if you’ve not come across this game before, check out the Wikipedia page for an example if so) Initially, we know very little and what we do know isn’t directly useful – how many squares there are. Once we’ve tried a couple of random clicks, we have some information we can use and express in these logics.

Perhaps we know that the square at 7 across and 12 up contains a mine and that the square at 8 across and 12 up has one adjacent mine. The question we might ask is whether 8,12 contains a mine and this is where satisfiability comes into play. Given logical statements encoding the rules of minesweeper, and the two details we mentioned specific to this game, is it possible that 8,12 contains a mine?

In other words, is the set of logical statements describing the rules, this game (to this point) and a statement expressing that 8,12 contains a mine satisfiable? The answer of course is easy for human to figure out.

The ability to automate the search for satisfiability allows these kinds of problems to be solved quickly and accurately using tried and tested reasoning tools. The general approach we used on the course was to search for a satisfying interpretation, which quickly leads to a problem – combinatorial explosion. The number of possible interpretations of a set of logical statements grows exponentially as the number of logical variables grows.

As such, a big focus on the course was algorithms and approaches to battle this exponential complexity – there are many techniques to reduce the space of possible interpretations and bring the complexity problem under control. That said, one of the impressions I come away with is that for a given problem no generalized technique will be able to guarantee to determine satisfiability in a reasonable time.

A final point, and an important one, is that the techniques we used were based almost entirely on mathematical proofs, that is to say that we are able to deduce that the techniques are correct and have certain other properties using formal methods. Of course, mathematical proof is formed by logical deduction too, so there’s a certain recursive nature to all this.

It’ll be good to get started on a fairly relaxed process of revision for all this stuff. The sheer volume of information thrown at us on these modules is huge, and it always feels a little like riding a rollercoaster, so it’s good to get back in there and review the material at a more relaxed pace.