One plus One is a slightly bigger One!

A blog on programming, football and other stuff.

Indian and Pakistani Cricketers - Who Make Better Debuts?

Recently a friend and I had an argument about who makes better debuts among Indian and Pakistani cricket players. Now, I am not into cricket. But I do watch the odd matches here and there and am generally aware of what is going on in the game. My stand was that Pakistani players make better debuts compared to Indian players, while my friend was adamant that Indian players make better debuts. My friend is more of a cricket guy than I ever was and he asked me what is the basis of my stand. It was just a gut feel for me and I had to leave it as there was no way for me to prove that my gut feel was correct. But I did want to try, though.

I decided to see if I could prove my theory. Luckily, Cricinfo has all this data, although they dont make it easy for you to get it. I wrote some scripts to pull this data and decided to visualize this using the awesome d3 library. In the end, this became more about figuring out d3 than winning the argument, but doing this was fun.

This is what the end result looks like:

Batsmen

X-axis: Total Runs, Y-axis: Batting Average, Size of bubble: Highest Score, Blue: Indians, Green: Pakistanis. Move mouse over a bubble and see the alt-text for full stats.

This scatter-plot shows debut batting performances by all the test players to have ever played for India and Paskistan (Except those players whose batting averages were not computed). As you would have guessed, the blue bubbles are Indian players and the green ones are Pakistanis. The number of runs scored by the each batsman is on the x-axis and batting average in the debut series is on the y-axis. The size of the bubble is representative of the highest score the player scored in the debut series.

I learned some interesting things from this graph:

  • Sunil Gavaskar made an amazing debut. His score card for his debut series in 1971 against the might West Indies of 1970’s read like this - Matches: 4, Total Runs: 774, High Score: 220, Batting Average: 154.80, 100s: 4, 50s: 3. That is incredible and adding him to this plot would have totally skewed this plot. So technically, I did not learn about Gavaskar’s amazing debut from this plot, but from the process of plotting it. But you get the idea.
  • It looks like Indian batsmen have historically made slightly better debuts. From my not-so-keen-cricket-lover point of view, anybody who scored 150 runs at an average of 40 had a good debut series. (I chose that arbitrarily, but if you are more of a cricket fan than me, you would have better scales to spot a good debut, and you can draw your own conclusions :-)

Bowlers

X-axis: Bowling Average, Y-axis: Wickets taken, Size of bubble: Best bowling performance, Blue: Indians, Green: Pakistanis. Move mouse over a bubble and see the alt-text for full stats. Bowling average: Less the better

This scatter-plot shows debut bowling performances by all the test players to have ever played for India and Paskistan (Except those players who took no wickets and hence is not of interest to us). The X-axis shows bowling average and the Y-axis shows number of wickets. The size of the bubble is representative of the number of wickets the bowler had in his best bowling performance in the debut series. Since it is indicative of only the wickets, players with best performances of 2/12 and 2/76 would be shown with bubbles of same size.

Interesting bits:

  • Pakistani bowlers make better debuts than Indian bowlers. In the crowd of players with a bowling average of less than 35 and 5 wickets of above, Pakistani players dominate. (Again, disclaimer about arbitrary scales to measure a good performance :-)
  • There are a number of Indian bowlers like Dilip Doshi, Shivlal Yadav, Ravichandran Ashwin and Srinivas Venkataraghavn whose debut performances were clear outliers.

The Code

All the code I used to pull data from Cricinfo, the actual dataset and the Javascript code that generated these plots is available here on Github. Feel free to use the dataset to create better visualizations than mine.

Credits

Srijayanth (@craftybones) helped me a lot with d3 and choosing the colors for the visualization.

Fixing Flyspell for Emacs in Mac OS X

I use the flyspell-mode as a spell checking mechanism in emacs. Recently, I moved to Mac OS X, and I began to get this error whenever I started emacs:

Error enabling Flyspell mode:
(Searching for program No such file or directory aspell)

I had installed aspell with Homebrew. The issue seemed to be that Emacs was unable to find the aspell binary. Homebrew installs binaries in /usr/local/bin and it was in my $PATH. It turns out Emacs uses it’s own exec path to look for binaries to execute in sub-processes. So the fix is to add the /usr/local/bin path to the exec-path. This is the change needed to the ~/.emacs file:

 '(exec-path (quote ("/usr/bin" "/bin" "/usr/sbin" "/sbin" "/usr/local/bin"))))

Notice the /usr/local/bin in there.

Finding Un-merged Commits With Git Cherry


In a project that I was a part of in the recent past, we used Story Branching. While it afforded us flexibility in pulling and pushing stories in and out of releases, it has given us some scares in the past. Somebody makes commits against a story, but the commit does not get merged to the correct release branch where it is supposed to go or gets merged to another release. The solution was to hunt down the commits that are missing or have creeped in.

This is where the git cherry command is useful. Git cherry finds commits not merged from a branch to another. From the man page:
 “Every commit that doesn’t exist in the <upstream> branch has its id (sha1) reported, prefixed by a symbol. The ones that have equivalent change already in the <upstream> branch are prefixed with a minus (-) sign, and those that only exist in the <head> branch are prefixed with a plus (+) symbol”

Consider the following example. I have two branches - master and release-23.



The branch release-23 has three commits:



The branch master have two commits:




  • Commit 1afda04ccbf2f834663ca8ec3eaf6e3b917581fb (Added foo) is present in both branches.
  • Commit 2a446b1a19253a69c4bb133eedb311c14b2906e8 (Added bar) in the release-23 branch was merged to master, but the commit message was later ammended and its sha became 8c71e1b2232c1a524e1de20553180676fb971f86 (Amended. This was Added bar).
  • Commit f06e4df25724ad0dd51702a10f075d39368e1963 (Added zoom) is present only in the release-23 branch.


If we do a git cherry now with master as the upstream and release-23 as head:



This tells us that

  1. An equivalent of commit 2a446b1a19253a69c4bb133eedb311c14b2906e8 (Added bar) is present in the master branch, as indicated by the (-) sign.
  2. Commit f06e4df25724ad0dd51702a10f075d39368e1963 (Added zoom) is present only in the release-23 branch, as indicated by the (+) sign.


If we were to do git cherry the other way around, ie. with release-23 as the upstream and master as the head:



This tells us that

  1. An equivalent of commit 8c71e1b2232c1a524e1de20553180676fb971f86 (Amended. This was Added bar) is present in the master branch, as indicated by the (-) sign.
  2. There are no commits in master that are not present in release-23.

That is pretty much what the git cherry command does.

Bullet Proof Jenkins Setup

In this post, I will describe how a neat setup and some discipline will ensure a Jenkins that can be rolled back and recreated very easily - a bullet proof Jenkins setup.

I have been working on configuring our Jenkins instance. This was the first time I had played around with Jenkins. I am fairly comfortable with Go from ThoughtWorks Studios. All of my past teams used Go as their tool for continuous delivery.

One of the things I found very different from Go in Jenkins is the absence of the notion of a Pipeline as the basic entity of build, as proposed in Continuous Delivery. Although there are plugins to make this available in Jenkins, we decided to go with Jenkins’ model of Jobs.

Another difference I spotted is that when a custom task is defined as part of a Job, Jenkins creates a shell script with all the steps while executing the Job. In Go, each of the steps will have to be defined as a custom command.

We wanted to ensure that our Jenkins configuration is version controlled. While this is a huge win, one of the ways this situation deteriorates is when a large number of changes are made to the configuration over a period of time and these is not checked in. So we decided to take this one step further and ensure that these changes are automatically checked in. There are instructions on how to do this, but we had to do some tweaking to get this working for us.

These are the steps to setup a bullet proof Jenkins setup. This assumes that Jenkins is running on a Linux box.

1. Create a Git repository in Jenkins’ base directory - This is generally /var/lib/jenkins
2. Create a .gitignore file to exclude Jenkins workspaces. The Jenkins base directory is the home directory for the Jenkins user created to run Jenkins. This means that there will be a number of Linux user specific files like .ssh/ , .gem etc.   These files need to be specified in the .gitignore file. A sample .gitignore file is listed below.


3. Setup a Jenkins job to check in the changed configuration files every day at midnight. (Or whatever time interval you choose). Add a custom task with the following steps:


While this ensures that the configuration is more or less tracked well, there are times when somebody makes a massive change in the configuration. This is where the most important piece of the bullet proof configuration comes in - team discipline. The team should ensure that big changes are checked in as soon add possible. This can be easily done by triggering the Jenkins job manually, without having to ssh in to the Jenkins box.

Credits:
1. The Jenkins community documentation provided a nice starting point for this.
2. The .gitignore file was forked from this gist by sit. I have added some project specific stuff to it.

Why Your Project Should Have a Getting Started Guide.

My new team at work is writing a bunch of Rails applications. This is one of those codebases that one would call “legacy” without much argument. Most of these apps have their own patched, vendorized Rails versions.

Getting up and running was an absolute pain. This project existed before Bundler and the list of gem dependencies are not checked in. I got the output of running gem list on a colleague’s box, wrote a Ruby script to generate a shell script that installs all the gems. When I tried running the tests in one of those apps, I got a nice error.

It looked obvious that we were using patched Rails versions. Surprisingly, theses were not checked in to the vendor/ directory. This was proving to be a pain, and today I sat with a team member and wrote a developer guide to get started on the project. We actually had to pull patched Rails tar balls from a remote box and untar them to vendor/ directory. I was really surprised that these patches had not been checked in. We checked in all the patches. Apart from the Rails patches, there was a gem that had to be checked in.

This is why I think every project needs a Getting Started guide:
1. As a developer joining a new team, I want to look at the code as soon as I can. For me, this involves reading and running the tests.
2. Due to various reasons, a project may have patches and dependencies. Being very specific to the project, there is no way a new developer on the team would figure out these hidden dependencies.
3. There needs to be a single place where all the dependencies are specified.
4. While spoon-feeding someone may not be the best way to get them started, a little bit of hand-holding will not do anyone any harm.

One of the biggest problems I see with things like these guides is that as the application and it’s dependencies evolve, the guides are not updated to reflect these changes. The only possible time the guide will be updated is when a new developer joins the team, follows these instructions and finds the need to update the guide.

Debugging: C#’s HttpWebRequest, 100-Continue and Nginx

Recently I spent some time debugging an issue our team was facing around some C# code making a request on one of our servers. The request was throwing a “The server committed a protocol violation. Section=ResponseStatusLine” error.

Initial investigation suggested that this could happen if we are making HTTP/1.1 requests to a server configured for HTTP/1.0. Our Rails application runs on Mongrel fronted with nginx 0.6.5. We modified the C# code to use HTTP/1.0 and the error went away. The following line does the trick.

request.ProtocolVersion = HttpVersion.Version10;

But wait! This means that somewhere in the chain, a server is configured to use HTTP/1.0. It looked unlikely and further debugging revealed that it was indeed not the case. Further staring at the Rails logs showed that one of the headers that the app expects was not being set, when the request was done using HTTP/1.1 from the code.

After some time, we figured out[1] that the .Net library throws the “server committed …” error if it is expecting the HTTP 100 (Continue) response in the wrong way. We set the code to not expect the HTTP 100 response from the server using

      request.ServicePoint.Expect100Continue = false;

and voila, it worked. The Rails app received all the headers it expected and things worked fine. The code looked like this:

So what is happening?

The HTTP 100 status is supposed to work like this. When a client has to send some data, instead of sending it upfront, it can send some headers along with the “Expect:100-Continue” header. The server responds with a 100 if it is willing to accept the request or send a final status. The spec is here[2].

We are using nginx as a proxy. The specification says that the proxy should forward the request if it knows that the next-hop server is HTTP/1.1 compliant. The proxy is supposed to ignore the “Expect:100-Continue” header, if the request came from a client using HTTP/1.0.

In our case, the default behavior of the .Net HTTP client library is to set “Expect:100-Continue” header on every request for HTTP/1.1. So the client sends only some headers and waits for the 100 response from nginx. Nginx sees the request, knows that Mongrel supports HTTP/1.1 and just forwards the request. The app sends a 401 because it could not authenticate. The client is expecting a 100 and gets a 401. It thinks the server committed a protocol violation.

When we ask the client to use HTTP/1.0, the .Net library does not use the Expect header, sends all the headers and nginx forwards the request to Mongrel. The authentication goes through.
When we explicitly set the Expect 100 property of the library to false, it sends all the headers at once and the authentication goes through fine.

Looks like there is a way to tell .Net not to expect 100 from the server through configuration, by putting this in .exe.conf


1. http://stackoverflow.com/questions/2482715/the-server-committed-a-protocol-violation-section-responsestatusline-error
2. http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.3

Talk on Football and Politics

I recently did a talk on football and the politics behind it, as part of the Banyan Tree talk series at ThoughtWorks. The slides from the talk can be found here on Speakerdeck.

It was great fun for me. I thought I did well. It is a topic I know my way around. The history behind the El Clasico - the Spanish civil war, Franco, the trade unions and Di Stefano was discussed. We also looked at The Old Firm. I also talked about Athletic Bilbao’s use of football as a form of cultural resistance. The last parts of the talk were about multiculturalism, united Europe and football as a globalized game.

 About 20 people turned up and while most of them liked it, some had the opinion that they needed to understand more European history to fully appreciate it.