Essays on the State of the Art and Future of Text Mining

The coursework for this Text Mining module has been quite challenging. Each week we had a task to complete, along the lines of evaluating training of a part-of-speech tagger (a piece of software that tries to tag words with the part of speech they serve), or create a named entity recogniser (a piece of software that tries to work out that some sequences of words have meaning above their component parts – for example “New York” means something different to “new” and “York”) using various methods. As I’ve worked through though, the goals have become clear – we were building up components that could work in sequence to process text. Neat.

One aspect of the coursework that was unusual was that it is all to be handed in together at the end, rather than week by week. If I’m honest it’d probably have been a little easier if I’d done the coursework in step with the lecture days – I actually fell a little behind because of various commitments.

Then there was the essay. A 3,000 word essay on the state of the art of text mining and my views for the future of the field.

I’ve not written an essay for at least 15 years now, and getting started was a real challenge. Text mining and Semantic Web maybe? Sentiment analysis is the future? I was pulling my hair out, trying to find an angle that I could argue cleanly though, citing academic research and the like. I’ve been screwing up outlines on bits of paper about a week now!

That said, when I headed into Manchester yesterday and sat in my lectures, I had something of an epiphany. I guess the problem was that I feel the field has huge untapped potential, and I struggle to argue through a point of view I care about when I can’t see the current approaches panning out. I’m going to take a bit of a risk, and write an essay that (constructively) criticises some aspects of text mining today, proposing and arguing through a slightly different approach.

We’ll see how it goes – the last few bits of paper have so far avoided a one-way ticket to the bin. Hopefully I can produce a well-argued, reasonably interesting essay that I’ll get some marks for!