crossedstreams.com

Logic and Applications Days 2-4

I normally manage to get a post out each day I’m studying a module, but I’ve really struggled for time this time round. I’ve had a few items in my personal calendar that I didn’t want to miss, but over and above that the workload on this module is pretty intense.

There’s a coursework assignment each week, which constitutes a whole lot of work, but I’ve also chosen to switch from handwritten assignments to preparing my work in LaTeX. It’s made me pretty uncomfortable preparing extensive handwritten documents (on the order of 6-12 pages) for a few reasons. A mistake I make could involve wasting a lot of time re-writing a document, backing up a paper document regularly kinda doesn’t work, and handing in paper documents involves either physical locality or the good old Royal Mail. An electronic version has none of these drawbacks.

On the other hand, preparing this kind of material electronically is non-trivial, as there’s a lot of odd tabular layouts and logic formulae involved. I discounted anything other than LaTeX out of hand, as word processors aren’t really designed for that kind of material. Preparing this stuff electronically is more time consuming initially, but I console myself with the thought that I can correct a mistake very easily!

Anyway, Logic and Applications. We covered the relatively simple Propositional Logic in the first two weeks, and moved onto First Order Logic in weeks three and four, on into week five. So far, the idea seems to be to encode knowledge and deduce from that knowledge more information using rules that can be proven to derive correct conclusions.

Why different types of logic? Propositional logic is simplest, has certain helpful properties as a result, and has limitations you run into pretty quickly .

For example, in Propositional logic, we can encode explicit facts, like “Paul’s Mother is Helen” and “Helen is Female” and work with those facts, but we can’t generalise from there. For example, we can’t add “Jo’s Mother is Sylvia” to our system and deduce that Sylvia is female. To do that would require an ability to say something like “Mothers are Female”. We can’t do that without variables and quantifiers, so we need something more expressive. First Order logic gives us that additional expressivity, for the price of it being more difficult to learn and understand, and being undecidable. That last bit means that in general, no algorithm exists that can be guaranteed to give a yes or no answer as to the truth of a formula expressed in First Order Logic in a finite amount of time.

So, lots of fun to be had with proofs, algorithms and understanding some really abstract stuff!

Logic and Applications Day 1

It’s the start of year three, and I visited UoM last Friday to got through my options and choices for what will almost certainly be the most important year of my MSc.

As usual, I need to choose taught modules. I need to get at least two done, but three would wrap up the taught part of the course completely, which would be a great place to be at the end of the year. I’m going for a bit of an AI flourish to finish with “Logic and Applications” to start in September, followed by “Ontology Engineering for the Semantic Web” Starting November, with a gap in March and then potentially “Text Mining” in April.

I also need to choose a final project, which I’ll need to complete over the course of a year, do some original work in the field of Computer Science and produce a dissertation on the order of 60-100 pages. The big question is do I want to put forward a project of my own devising, or take one that has been proposed by the CS School or a company? In order to complete this part of the course (which counts 50% of the final grade), I need to complete a further special module called “Research Skills and Professional Issues” which runs between November and March. It’s a big decision on a piece of work that’ll sink a huge chunk of my time over the next two years, so I’ll be getting in touch with the project organiser to help me evaluate my choices. One way or another though, it needs to start this year to be do-able in the remaining time.

So there’s a lot of stuff to do – and to pay for. I hadn’t thought about it, but this setup essentially means that I’ll be paying for the remainder of the course this year – well, pretty much now – which is just over £3,000. Ouch!

So anyway, whilst all that’s going on, the first module has started – Logic and Applications. This is a new module this year which seems to have at least partially replaced the Knowledge Representation and Reasoning course that I naively attempted back in 2008… having not fully understood the pre-requisites, I crashed and burned hard and ended up dropping it, as a result achieving nothing for the whole first half of my first year.

This time, the course doesn’t have any pre-reqs, but I’ve spent the last couple of months reading up on Propositional Logic, Resolution, Theorem Provers and First Order Logic, and last week implemented a satisfiability checking algorithm called DPLL and an implementation of a logic-based game called Hunt The Wumpus (specifically because there’s great reference material in AIMA to check my approach) to prove to myself that I understood the concepts. As a result, it looks like I’m pretty much covered up to at least week/day three, which is a good thing.

So, Logic and Applications Day 1 – propositional logic, set theory, mathematical proofs and satisfiability went pretty well. It introduced a couple of gaps in my knowledge, more on the mathematics side of things – what is a reflexive-transitive closure and how would I create and use one, for example. Now to get the week one coursework done… after all, the weekly coursework counts 50% of the marks for this module.

As an aside, the Blackboard system that we used in the first SSD&X module hasn’t resurfaced in any other modules so far. In fact, the preferred method for submission based on my experience is…. wait for it… paper. It’s a bit odd, having spent the last ten years passing papers and documents around pretty much exclusively in a digital medium.

I can sort of see why for this module, at least – the mathematical content means that there’s a lot of symbolic stuff going on, and they’re testing the ability to do the maths, not use tools like LaTeX. That said, those tools seem to be essential for anyone working in CS, so maybe you could make the case that all the CS modules should teach and promote their use.

It will present me with a problem, as it always does. There will be a coursework set in the last lecture day to be handed in during reading week, and I’m not taking time off work to get over to Manchester purely to hand in a piece of paper.

Getting Ready for Year 3 2010-11

It’s that time again.

The results came in for last year and everything’s comfortably over the 70% mark so it’s past time to get cracking on module choices and learning ready for year three. It seems like it was a very long time ago that I applied to Manchester!

So to recap, so far I’ve studied “XML and Semi-Structured Data”, “Machine Learning” and “Patterns for e-Business Applications”. Has what I learnt come in useful yet? I can honestly say that it has.

The deep dive on XML comes in useful as I now have a fair idea of the toolkit available to me for XML processing and I have an idea as to why XML is constrained in the ways that it is. Knowing the basics of some common XML formats like RDF and RSS also comes in pretty handy when the topic comes up.

Learning about machine learning blew my fragile little mind. I have to admit right now that I haven’t actually used what I learnt here yet – I think it’s quite specialist and you’re either exposed to it or you’re not. Maybe my future plans will take me more into this kind of technology. The possibilities certainly seem endless!

Even just knowing about IBM’s patterns for e-business is helpful in that I now know of a new set of resources and I can put names to some of the patterns I’ve used before now.

So, looking forward. What am I considering this year?

Well, for a start it’s rematch time. The first module I tried (and failed) to take was called Knowledge Representation and Reasoning. I found that I didn’t understand the pre-requisites (in particular First Order Logic) and so was unprepared for the material. This year I plan on taking Logic and Applications, and I’ve been studying here and there for the past couple of months. I’ve gone so far as to write a satisfiability checker using the DPLL algorithm. It even works!

I’d also like to take Ontologies and OWL. Ontologies are a way of representing knowledge and relationships, allowing computers to reason about things and OWL is a language for defining ontologies. It’s an intersection between the semi-structured data stuff and logic. To do the pair would need me to take two consecutive modules and exams between October and January which would be pretty tough going… we’ll see.

That would be five of six modules. I wish there was a module teaching the principles behind functional programming… but there is not. There is mention of a Natural Language Processing in the course modules which for some reason sparked my curiosity, but there’s no details as to the content or when it will be taught so we’ll have to wait and see on that one.

Oracle Sues Google over Android

News has emerged of legal action being taken by Oracle (which recently acquired Sun Microsystems, the company behind Java) against Google over alleged infringements of patents in the Android operating system which currently enjoys great popularity in the mobile phone market.

As I’ve been giving software patents a bit of thought recently, I find this development quite interesting. The actual complaint against Google has been posted on VentureBeat and is worth a read. The language used comes over as direct and aggressive, but I think that’s just the way these legal claims are phrased generally.

There are a number of patents involved, all issued in the United States – so what are the alleged infringements? Let’s take a look – I’ll link the patents mentioned to copies of the claims and give a few thoughts based on a quick review of what I think the gist is. I should also say that patent claims are generally pretty dull to read so I don’t claim to have done a detailed analysis!

US Patent No. 6125447, “Protection domains to provide security in a computer system” filed in 1997. Lays out mechanisms to manage what software components can do in a computer system.

US Patent No. 6192476, “Controlling access to a resource”, filed again in 1997. A method for controlling, for example, the access a thread has to system resources based on what code is running in the thread at the time.

US Patent No. 5966702, “Method and apparatus for pre-processing and packaging class files” filed in 1999. A collection of mechanisms used in Java’s classloading operations.

US Patent No. 7426720, “System and method for dynamic preloading of classes through memory space cloning of a master runtime system process”, filed 2003. Er – what the title says!

US Patent No. RE38104, “Method and apparatus for resolving data references in generated code”, a re-issue of another patent originally filed in 1992 – three years before Java was released – by the James Gosling. This patent is all about mechanisms Java uses to achieve the flexibility of an interpreted language with performance more akin to a compiled language. I wonder if this (potential) patent suit was a driver for Gosling’s departure from Oracle in early 2010?

US Patent No. 6910205, “Interpreting functions utilizing a hybrid of virtual and native machine”, filed 2002. More mechanisms Java uses to improve performance.

US Patent No. 6061520, “Method and system for performing static initialization”, filed 1998, improving performance of initialising static arrays.

Regardless of your views on software patents in general, I’d say these patents are quite well written and quite specific to the ways Java works. It’s likely that many will view this action as evil Oracle sniping at good ol’ Google, but I’m not sure I share that view.

This isn’t a corporate giant picking on the little guy – right now, Oracle is worth around $115bn, and Google weighs in at around $158bn. Why shouldn’t Oracle use the intellectual property assets it acquired when it bought Sun?

Will this lawsuit damage Google or kill Android? I don’t think so – besides the size and diversity of Google, Oracle and Google don’t seem to be directly competing in related domains, so it wouldn’t make much sense for Oracle to actually want to damage Google. I expect that this action is a move to get Oracle a slice of the Android pie and I think it might succeed.

Will Oracle’s customers turn away because of this action? Oracle’s big in the corporate world, and big companies aren’t very likely to take issue with business machinations such as these.

It could turn evil from there. If legal action against Google sticks, is it possible for action to be taken against everyone downstream of Google from the phone companies to we the users? It’s more difficult to imagine those kinds of moves being played by Oracle – there would surely then be reputational damage.

Another question I’m not sure of the answer to – where is OpenJDK in all this? Does this action present a future risk to these open source efforts or are there differences in licensing between Android and other open source initiatives?

At this stage at least, it seems to me that Oracle is playing the patent game by the rules. If there’s something wrong, it’s with the game, not the players.

Hibernate 3 Tip – Log PreparedStatement bindings

I was trying to see what values were being bound to placeholders in the JDBC PreparedStatements generated by Hibernate DAO test classes I’ve created as they go about their persistent business.

Dead easy, right? Hibernate supports a configuration parameter ‘show_sql’. Set that to true and see what’s going on under the covers. Well… not so much. For a simple save operation, setting that results in the following logged output:

Hibernate: insert into ComponentGroup (name, id) values (?, ?)

Not exactly what I was hoping for. I don’t know what values have been bound to those two question-marks. After a bit of faffing and google-fu, I found this short-but-sweet post which showed a number of additional logging options to enable (assuming use of log4j). As it turns out, the important one in this case is:

log4j.logger.org.hibernate.type=TRACE

Which add a little more detail to the output:

Hibernate: insert into ComponentGroup (name, id) values (?, ?)
19:23:40,753 TRACE StringType:151 – binding ‘group1’ to parameter: 1
19:23:40,754 TRACE LongType:151 – binding ‘1’ to parameter: 2

It has to be the TRACE level, not DEBUG, and I can now see that the effective SQL, substituting for the placeholders is

insert into ComponentGroup(name, id) values (‘group1’, 1)

which helps me work out the detail of what’s going on.

Quick Review of ‘Spring in Action’

My better half bought me a copy of Spring in Action (2nd Edition) by Craig Walls last year. I think it’s been a great help for me as I’ve been getting started with Spring.

I’d say the first four chapters are worth reading in sequence to get a feel for what Spring does and how it does it.

Chapter 1 introduces what Spring does, with some really nice examples of how using the dependency injection capabilities allows components to be mocked up and unit tested much more easily. I think writing good unit tests can be challenging (well, it is for me anyway) so it’s nice to see this theme taking a prominent role.

Chapters 2 and 3 start onto a description of Spring’s dependency injection capabilities, from declaring beans and references to craziness like declaratively substituting method implementations in a class.

Chapter 4 moves on to Spring support for aspect-oriented programming, a technique with which behaviours of an application that really don’t belong in an object’s code (think security, auditing, etc.) can be defined outside of your business logic. There’s a nice theme of examples running through these chapters that somehow does make this stuff make intuitive sense.

From here on in things get a bit more esoteric. Other topics I found interesting were database access (covering JDBC, Hibernate, JPA and more), web services, EJB and JMS – but there are many more. For these later topics, you tend to get a little background and step-by-step introductions with examples. Given the range of topics there’s a often a surprising depth. The material has also proven to be quite accessible when I’ve gone back for reference.

There aren’t really any downsides. Where I’d like the material to go deeper there are other books I can get that are more specific. The occasional humour can be irritating if I’m in the wrong mood but hey, it’s in moderation.

If you’re looking for a good general introduction to what Spring is and what it can do for you, can recommend this book.

On Software Patents

As I’ve been trying to broaden my knowledge of IT and software development, I thought it would be a good idea to read up on the issues around Intellectual Property as it relates to software – specifically, the idea of software patents and the implications for developers. I think this stuff is important – infringement of patents can lead to legal action which is expensive and can damage reputations.

There’s a lot of information out there on the subject, so rather than just repeat stuff that’s already been said, I thought I would link off a few resources I thought were informative/enjoyable and why.

I found Paul Graham‘s essay, ‘Are Software Patents Evil?‘ after reading a few other resources, but I’d recommend it as a first read as it’s not too long and it has a prosaic style which I thought was quite accessible. It also seems to be a fairly balanced account of the pros and cons of software patents, whereas the other resources I found tended to be in one camp or the other.

Ciaran O’Riordan has published a lengthy overview of the state of play with Software Patents in Europe, also referencing interesting material about the reality of software patents and impact on innovation in the US. There’s a lot of information here between the content and the links and I found it took a bit of digesting, but worth it to find out the recent history with regards software patent legislation.

Patent Risks of Open Source Software is a nice short article, focusing on the legal risks inherent in Open Source. There are some good points made in this paper, answering questions like ‘can you just swap out an open source component that infringes a patent for a custom component you wrote yourself and be safe?’. Although the article is focussed on Open Source, it seems (to me) that most of it is actually applicable to software in general – how much protection do you really get if the closed source software that you’re using is found to infringe patents?

I couldn’t decide whether a user of software that infringes on a third party’s patent could be liable for infringement themselves, so I asked the question on stackoverflow.com. The answer seems to be that yes, a user could be liable, and there are a couple of statements and links off to articles that support that conclusion.

It seems to be something of a consensus that the software patent situation is becoming more heated, and that this focus is being driven by newer players in the game taking legal action perhaps inappropriately against other parties infringing their patents. Searching for company names and ‘patent’ tends to find sequences of results that patent-related news for that company.

The most surprising thing for me about this whole patent business is that a patent lasts twenty years. In IT today, the world changes week by week and month by month. Twenty years ago, there was no such thing as a website. I guess no-one patented the idea of a website. I wonder how the world would be different if someone had?

Pop quiz – can you think of an example of a computing technology that succeeded because it wasn’t patented, or one that succeeded because it was?

Nexus and OpenJDK

An odd one tonight, using the Nexus Repository Manager with OpenJDK, the open source Java implementation. Nexus mostly works fine, but fails to re-index the public repository group with (according to the wrapper log) a JVM crash.

jvm 1 | 2010-06-21 20:55:11 INFO [pool-1-thread-1] – o.s.n.i.DefaultInde~ – Cascading merge of group indexes for group “public”, where repository “releases” is member.
wrapper | JVM exited unexpectedly.
wrapper | JVM exited in response to signal UNKNOWN (11).

The problem manifested in the Eclipse IDE when the repositories view wouldn’t update, showing an empty folder under the Nexus public repo.

Switching out the OpenJDK implementation for the Sun implementation fixes the problem, and now re-indexing the public repository group works fine. Bug report NEXUS-3603 raised, but if you’re seeing this issue swapping the Java implementations seems to work.

End of Patterns for e-Business

Well, the P4EB exam was last week, so that wraps up that module – unless something goes terribly, terribly wrong and I have to resit!

The exam deviated from the previous years’ exams quite a bit. In two parts, the first part being pretty much just bookwork, the second part being a choice of three questions and more analysis based. In previous years, the second part was a set of three standalone questions, which meant that they were pretty well defined and it was fairly easy to see what knowledge the question wanted you to demonstrate.

This year, the second part consisted of business description and context diagram that was then used as the basis for all three questions. I thought that the questions weren’t so well defined, and so I’m not totally sure how much or little I should have answered with. Oh well – time will tell!

That’s also half-way through the taught part of the course – three down, three to go. For the last three I’ll be heading back to Computer Science modules, probably centred on Logic, Ontologies and Natural Language Processing – which means that I need to spend some quality time with mathematical logic this summer ready for next year.

Revising for P4EB Exam

I feel revision for the Patterns for e-Business exam is going pretty well. There are some interesting questions to answer, such as describing the difference between the Strategy and State patterns. That one’s absorbed a fair bit of thought to get my ideas to some degree of clarity and conciseness.

My revision schedule has settled down into a pattern now, being the public-spirited chap that I am, I’ll share what works for me with you.

Step 1 – Lecture Notes and Background

Review the lecture notes piece by piece, making sure that every term, statement and nuance is understood. This usually starts as soon as the lectures are done. Often, I’ll work through the notes noting down each important statement as a question so that I can quiz myself.

I haven’t yet had an exam immediately following the lecture series (I think in each case so far, the first five weeks have been lectures, followed by a reading week, a subsequent five week series, a reading week, and then the exam period) which would cut that time down to one week, meaning that revision would need to happen during the course of the exams.

This time involves a fair bit of ‘reading around’ the subject, chasing down those subtleties that I missed during the course of the lectures. Easily done – the pace can be kinda intense. This bit probably averages less than an hour a day – but it’s a marathon, not a sprint.

Step 2 – Past Papers

Answer every question on every past exam paper I can find to learn how the questions are asked and how to answer them.

I don’t pay too much attention to past papers until I feel I’ve got a good coverage of the course material. I hope that this helps me avoid just learning how to answer the exam questions. It’s more about the learning than the exams, right?

This step is generally no more than a week or two before the exam.

Step 3 – Exam Day

I like afternoon exams. I follow my usual schedule and get into University before 9am, giving me the whole morning to review the past papers and any troublesome spots one last time and generally take it fairly easy. It’s nice not to have to worry about travelling and delays, too.

This approach has also worked well for me for the Sun Certifications I’ve taken. As far as I can tell, there’s no real short cuts to learning stuff – it takes time and effort (if only I had a USB port for my internal memory!)