Tuesday, 25 September 2012

Understanding Unicode and UTF8 in Perl

I know there's a lot of things out there on the subject, but when it comes to Unicode, a quick review of core concepts and bug avoidance guidelines is never a waste of time.

I'm not a Unicode Guru, but working with third parties, I often find that a lot of people consistently fail to get the basics right about Unicode and encoding. There must be something esoteric about it. So here's yet another set of slides about Unicode/UTF8 in Perl.

 It's not meant to be a comprehensive presentation of all Unicode things in Perl. It's meant to insist on a couple of guidelines and give some pointers to get a good start writing a unicode compliant application and avoiding common issues.

Comments are open for questions. Happy coding!

Friday, 21 September 2012

Yes, One, I don't know - three tales about understanding people

As application developers, it's crucial that we understand what other people are saying, and we're often in a position where misunderstanding is an easy trap to fall into, specially when we speak with people with no computer science background. Yes that happens. Scary hu? The key point when this happens is to forget about everything you know about basic understanding, even in your native language. So here are three tales about how even the most simple language elements can lead to misunderstanding, and what to do about it.


I started working for (very little) money  as a bio-informatics consultant. My main duty was to write and use software to produce - in a few hours of number crunching - enough hypothetical results that would take molecular biologists decades and millions of investors hard earned Euros to validate in the wet lab. At this time, my brain was fresh from university and I was formatted to think about the world in terms of 1's and 0's. And there were only 10 categories under which my mind was ultimately classifying things. True and false. So the word "Yes" for me had only one meaning. It meant something was true in the past, now and in the future, independently of circumstances. So as you can guess, there were not many questions you could answer with a plain "Yes" to me. For instance to the question "Do you like hummus", if you answered me "Yes", then I would have served you hummus for the rest of your life. Well only until you've finally decided by yourself to tell me that "actually yes I like hummus most of the time, but not always". Which is actually closer to the reality. If it just about culinary tastes, it's OK, but when it's about work then the misunderstanding is annoying at best. And I've made this mistake quite a few times: Taking a biologist "yes" for a computer science "yes" and mis-design something as a consequence. Now I try to remember, I don't take "yes" as an answer. When I hear "yes", I try to question it further, think twice and make sure I deal with the case where this "yes" is not true. Because in the real world, it will happen.


I bet that like me you've noticed the following fact about software development:

"Given enough time and business analysis, any unique relationship will eventually have to turn multiple."

It works for almost everything. Your users can set their preferences? One day you'll want to manage multiple set of preferences. You display the hottest deal of the day on your homepage? Quite soon you'll want multiple. A document has got a language property? What about mixed language documents? multiple. A user has got a postal address? No, they've got multiple addresses - work/home/mum/dad. OK so now they have multiple addresses, but they choose a favourite one. Humm, what if they want a different favourite depending on the context. Think "Deliver to work, but send the bill to home". Multiple again. Programming language has got single inheritance? The next one will have multiple. And you can go on and on about it. There's no end to multiplying. I wonder sometimes why unique relationships are included in UML. I think it's to trick developers into believing people saying "One". But don't be fooled. There's no such thing as "One". If you don't believe me, read this multiple times.

I don't know

This one is not strictly speaking about computer science, but about science in general. I think that people with a scientific background of any kind are used to deal instantly with equivalence and equations. For instance, given the speed and the mass of a moving object, we instantly know its kinetic energy because we know that E = 1/2mv^2. It also works transitively. I don't want to go into too complex examples, but let's say you have a,b,c, d=f(b,c) , e=f(d,a), and let's say I give you a,b,c and ask you if you know 'e'. Someone with science background will tell you "yep, because you know a and d, and you know d because you know b and c".  I did the test with my wife - who's got a biology background - yesterday, so my statistics are pretty solid. To me this is a perfectly natural way of thinking. Until I took a class in finance.

In an attempt to avoid spending my elderly years begging in the streets of London, I took this class in finance on coursera to understand how the wheels turn. And the lecturer looks to me like a brilliant man, full of common sense with very good pedagogy. I highly recommend this class. Anyway. One thing that stroke me is that sometimes I was kind of lost into the teacher's thinking. He was presenting a example, where a,b and c were known and was asking the question: "Do you know e". And the right answer was not "Yes of course, because e=f(f(b,c),a)". The correct answer to him was "I don't know". And I understood why I sometimes didn't understand him. Because for him , "I don't know" doesn't mean the same as my "I don't know". To me "I don't know" means "It cannot be reached within a reasonable computing time". To a mathematician or a physicist, or a biologist,  it will probably mean more or less the same. But to him it means something different. It means "I don't know it yet because I haven't calculated it yet". Once I understood that, the rest of the class was suddenly crystal clear, because I had understood that the meaning of this simple sentence "I don't know" meant something different between him and me.

The fact that we cannot assume that everybody shares the meaning of such simple concepts like "Yes", "One" , "I don't know" and probably lot of other ones is quite fascinating and tells us that effective communication between human beings of different backgrounds is probably one of the most difficult things to achieve at work and in life in general.

I don't know if there's a solution to this, but until people more clever than me find one, I'll try to stick to this rule: Don't speak to non IT people Try to understand what meaning others attach to basic phrases.

Saturday, 15 September 2012

JS/HTML/CSS: A braindump from a backend guy

It's a wonderful sunny Saturday outside yet I'm here bloging about web development. I should see a doctor, but in the meantime please suffer my brain dump about things browser related.

I'm not really a front end developer, in the sense that I have very poor graphic design skills, that my knowledge of Javascript and CSS is limited. However over the time I had my fair bit of front end development, more recently here, not counting the things I do for fun, so yeah, I don't have expert level knowledge about all of this, but I feel I have enough experience to broadly think about some general guidelines about what should be fed to the browsers. So the goal of this post is not to dictate what I think is right, but rather to put some ideas on the table and calling for critics and alternative and/or improvement suggestion. So if you're a JS Ninja, or a CSS old-master, please be indulgent, and share your art in the comment.

Ok let's get started with the general page structure. I like a page source to be clean and organised (not like my bedroom), but also the goal here is to allow the browser to render the page as soon as possible. In other words, to reduce the time to first pixel:

      <title>My Wonderful Page</title>
      <meta name="description" content="Some meaningful and reasonably long description">
      <script type="text/javascript" src="/js/head.js"></script>
      <link type="text/css" rel="stylesheet" href="/one/css/file.css" />
         All sorts of HTML stuff.

     <script type="text/javascript">

Let's go though this step by step.

Use a JS Loader. So the idea here is to do as little as possible in the header, and deport the heavy lifting to the end of the body. In your header, you should have only one and only one minimal script, and this script should allow you to load things later at your page bottom. Here I use head.js. It's got other fancy base features but load.js is also a good alternative if you like to make it even smaller. Let your page be pear shaped.

One CSS File. Not two, not three. One, and preferably on the same domain as your main site (so relative URLs are welcome here). Why is that? Here we reduce the number of extra requests the browser has to do to a minimum of erm ... one, and by having the CSS hosted on the same hostname, you avoid doing extra DNS queries, saving you precious milliseconds. Some people even argue that you should have your CSS in the page itself. I'm not sure about this one. It's true that it saves you the extra query entirely, but at the same time, your page is now heavier and therefore longer to download. There's probably a reasonable limit under which this actually saves time, but I'm not sure about its value. Of course in the development environment, you probably want to have your css neatly organized, so you don't have a giant ball of spaghetti monster to deal with. I've tried different techniques to achieve that, but didn't come with something I really like yet. Suggestions?

All sorts of HTML Stuff. Warning. Controversial bit. What? HTML stuff, controversial? Well yes, as it seems to me that with the advent of the uber-ubiquitous Ajax coolness, a large amount of people chose to ignore a fundamental fact of life (at least of life as a developer): Browsers were born to render HTML and dynamic web servers can generate HTML. For instance, I know a site where every time they need to display a table of something (let's say historical stock prices), they first display an empty table, and then, when all the page JavaScript is loaded, they make a second query to get the data to populate the table. It drives me mad. Not only I don't see the point* , but more importantly, you can imagine how it kills the 'time to useful pixel' of their pages. It's not that I'm against Ajax. not at all. I love it, I was doing it wayyyy before jQuery became popular. In fact you'll find here some posts I wrote about how to easily 'ajax' anything. But I kind of believe that if you want to do JS object models and get JSON from the server instead of HTML, you'll be much better off developing your backend as an API and generate your GUI using something like flex, or GWTK (please add yours to the list :)). I certainly reckon the incredible value and power of OO JS/Json for complex and specific information manipulation and display (down with flash and applets!!), but for general pages development, I prefer to stick with good old HTML, specially if the application is 'browsing' oriented.

One JS Function that does it all. I call it (observe my creative naming skills) 'installFancyJS', and it takes one argument: the jQuery** context it's going to used in. So what does this function do? Well, it installs callbacks and all sort of responsive code in the page just after it's loaded. More precisely in the given context. To give you a very minimal example:

function installFancyJS(context){
function installTopClick(context){
    $('div#topbar' , context).on('click', function(ev){
        $('html,body').animate({scrollTop: 0 },
                               { duration: 'slow', easing: 'swing'}

The reason why I propagate the context everywhere is that when I get a new bit of HTML using Ajax, I want to be able to install the full set of Javascript features on the newly created slice of document. In here's an minimal example of Ajax callback that uses the function:

                  var newbit = $(data);
                  // install fancy js on the container's new content.                                                        

That's is for now. I hope you enjoyed reading this and found some interesting/ridiculous/horrible things to talk about.


* But I see the reason: they got tied to the horrible datatables.net.
**yes I use jQuery, please rant and suggest in the comments if you know a better one.