Behind the scenes

As said in our previous blog post, we have been and are still doing a lot of work on the backend side of iCheckMovies. This wednesday (april 10th), we will have a change on the backend that you will notice as the website will be down for small time ;) This is due to our two front-end servers being replaced with new servers.

This infrastructural change seemed like a good chance to write that technical blog post we were promising in our last blog post. So here it is! This blog post will go into detail on what we have been (and will be) doing. A small disclaimer upfront: this blog post will be a bit technical, so don't feel bad when you feel like this after reading this blog post:

So buckle up and prepare for a ride down the technical side of iCheckMovies.

When we launched iCheckMovies, everything (which was one web-server and one database at that point) ran on a single server. That server also hosted other websites. As iCheckMovies grew and grew, it started to behave a bit like a prima donna as it regularly demanded all attention from the server. As this was a single server, this led to the other websites getting less time and thus their performance dropped significantly. At this point we realized that we had to cave in to our prima donna's demands and get iCheckMovies her own server. As we had no source of income at that time, we found two used machines that fit in our budget. Both machines served as web-servers, and one of them also doubled as the database server. This was a huge improvement over our previous situation.

For a time everything was well, but it wasn't very long until our two servers started complaining. In particular, the database server was having a really hard time keeping up with the demand. Marijn made some projections and they were not good. If the growth continued at this rate, the database would soon not be able to handle the traffic timely, in effect slowing the website down. The conclusion was simple: we needed a new database server. The only problem was: how to pay for it (unfortunately servers are quite a bit more expensive that regular PC's)? We then came up with the donate-a-thon, where users could donate us money in return for a paid account on the upcoming iCheckMovies 2.0. The response was overwhelming. Within a week we had enough money to buy ourselves a new database server. Our second used server went from web-server/database-server to just a web-server and the performance of the website improved once again.

This configuration has been working fine for us for quite some time, but we are starting to notice that our two front end web-servers are somewhat old machines. We have purchased two new servers last year, so we’ll replace the old servers with the new ones. Furthermore in the next few months, we will install a hardware firewall, two new network switches and two load balancers/web servers. We’ll blog about that later.

All of our servers run CentOS, which is one of the most popular Linux distributions. From the beginning, our web-servers have ran Apache as our HTTP server. Around a year ago, we started moving from Apache to nginx, which has less features than Apache but that also makes it faster and easier to manage. We started our migration step-wise. First we ran nginx next to Apache. nginx handled all static resources (images, CSS files, etc.) whereas Apache handled all dynamic resources (the web pages themselves) via mod_php. Somewhat over a month ago, we finished our migration and let nginx handle everything. The dynamic resources are now handled via fastcgi and php-fpm.

As said, we have more than one front-end server. One of the classic problems is: how to ensure that both servers have the exact same configuration? We used to do this configuring manually, but have moved to using Puppet. With Puppet, you can define your configuration in configuration files, deploy those files to your servers and then have that configuration automatically be installed on those servers. After Marijn painstakingly created the configuration files, updating the configuration of servers became easy. This will also make it far easier to replace the two old front-end servers with the two new servers this wednesday.

From the start, we have used MySQL as our DBMS. The reasons for this were simple: it was free (no licensing fees) and we were all familiar with it. We started out with MyISAM tables for performance reasons, but later switched to InnoDB as it is a more full-featured storage engine than MyISAM (support for foreign keys, etc.). Over time, MySQL has been tuned to squeeze as much performance out of it as possible, but the biggest leap in performance came when we bought our new database server thanks to the donate-a-thon. This new server came with a lot more internal memory, which really made a huge difference.

One of the biggest pain points users had with the website from the start was how searches were handled. Internally, we implemented this through MySQL's FULLTEXT query feature. However, there is not that much customization that be done with FULLTEXT indexes and we felt that this was necessary to improve the search results. We then decided that we needed to add a database server specifically for doing searches, also known as a search server. For this, we chose the Sphinx search server, due to its ease of use and great integration with MySQL (it can even be accessed as if it is a MySQL server). Although Sphinx did require some tweaking, it gave us some very nice features: results were more accurate due to far greater customization options, it had better tolerance for typing mistakes (such as in "Stra Wars") and it was a lot faster. Another consequence was that our MySQL server had to do less work so it also performed better after we let Sphinx handle our searches.

If you had to guess what is the most computationally expensive feature in iCheckMovies, what would you say? Without a doubt, this award goes to calculating your neighbors. The query used to do this has to do a lot of joining on a lot of large database tables. At its core, it is searching a graph for nodes (users and movies) and relations between those nodes (favorited, disliked, etc.). It is essentially the same problem social networks like Facebook and LinkedIn face when they try to suggest people you might know based on your existing network of relations. Luckily for us, there is a DBMS type specifically designed to handle this type of searches, namely graph databases. We have been doing some preliminary work on converting our neighbors calculating process into a graph database. For this we chose to use Neo4j. Unfortunately, we had to put this nice side-project in the freezer due to more pressing concerns. But we do hope we'll be integrating Neo4j somewhere in the future as it has great potential.

The iCheckMovies website has been written in PHP, as that was the language we were most familiar with when we started with iCheckMovies, plus it's free (which is an argument we Dutchmen are very sensitive to ;) )! It runs on our own, custom PHP framework which is lean, mean and fast. This setup has served us well, but now iCheckMovies has become as big as it is, we start to really feel the downsides of PHP. Not only is the language itself one that has its fair share of oddities and is miles behind other languages in terms of features (e.g. namespaces are a very recent addition to the language), there are more fundamental problems (such as its lack of multi-threading). Most importantly, PHP is not very fast and really bad with memory. Like really bad. It makes you want to cry:

That there are other websites with this problem becomes apparent when we look at the biggest website running PHP: Facebook. They have had a long history of trying to optimize PHP, but they ended up creating their own PHP compiler called HipHop that increases the performance of PHP. So far, we have not yet looked at HipHop, but we might in the future.

Ideally, we would like to rewrite the website in another (statically-typed) language like C# (LINQ and async/await FTW!), but this is not going to happen sometime soon. The reasons for this are simple: it will be an awful lot of work and it has the potential (more likely guarantee) to (re-) introduce bugs. The iCheckMovies 2.0 release made this painfully clear to us :( That does not mean that we will be stuck with PHP forever, it just means that when we do decide to convert the website to C#, we will do so in small steps. Most likely, we will start with converting some backend processes to C#, as they tend to be quite memory-intensive and C# handles memory much more efficiently.

We started with plain old CSS files, but moved to LESS a while ago. You can think of less as CSS on steroids or as CSS as programmers would have designed it (with inheritance, functions, etc.) Our JavaScript is plain JavaScript at the moment (with heavy use of jQuery), but we are looking into CoffeeScript and TypeScript, both of which would make things easier for us and less error-prone.

As iCheckMovies is a very dynamic website, the database server gets hammered a lot each request. The classic way to solve this problem is by caching data that does not change often. We started out caching on the file system. Although this served our purposes for a while, we improved our performance by switching to Memcache, which stores its cache in-memory. Reading data from memory is a lot faster than reading data from disk. The last couple of years, NoSQL databases have become more and more popular. We are looking into how we can use a NoSQL database like MongoDB or a key/value store like Redis to further speeds things up.

Some people have also noted that they don't receive blog post notifications anymore. We will be fixing this problem and we probably do so using the RabbitMQ messaging queue. This queue will allow us to handle long running operations (like sending out all the blog post notifications) a lot faster. A preliminary proof of concept showed great potential.

When we started thinking about the app, we first needed to decide between the following three options:
1. Create a fully native app.
2. Create an HTML5 app.
3. Don't create an app but optimize the website for viewing on mobile devices.

We chose for option 1 for one simple reason: performance. A native app can be much faster than a website (which has to be rendered in a browser) and is also faster than an HTML 5 app (which also uses a browser to render the app). Of course, the advantage of options 2 and 3 is that they should easily work on all devices (every devices includes a browser). Facebook originally chose option number 2 for their app, but later converted their app to option 1 with Mark Zuckerberg famously saying that betting on HTML5 was their biggest mistake.

So we chose option 1. How does that leave us in the area of app compatibility across different devices, as iOS (Objective-C), Android (Java) and Windows Phone (C#) all require different programming languages? As it turns out, there is a company called Xamarin that offers a solution. They have created products that allow both iOS and Android apps to be written in the same language: C# (which already was the default for Windows Phone). Through their products Xamarin.iOS and Xamarin.Android you get the benefit of one language across all platform, which means that you can reuse your code across platforms. You can also use all features of the native SDK's. Although their products are relatively new, they are starting to become more popular. Their biggest customer to date is probably Rdio, which has used Xamarin product to power their native apps.

After quite some time experimenting, we now feel that we have got a firm grip on the platforms and Xamarin's products so we can now really start developing the actual application. "Wait," you might say, "has all this time been spent on getting to know the platforms and Xamarin products since you announced you were developing an app?" The short answer is: no. We have left out one major (unseen) component of building an app: the server side API.

The development of the API itself was quite interesting. How an app performs is in large part dependent on how the server-side API performs, so early on we decided we wanted to stay away from using PHP (for aforementioned reasons). Instead we chose (as you might have guessed) to develop our API in C#. This gave us great performance and access to an incredible amount of large libraries (check out NuGet), but also had one big disadvantage: as we had to start from scratch we had to re-implement functionality already in iCheckMovies' PHP code. Obviously this took quite some time, as we did not want to simply copy the PHP in C# but wanted to fix some things at the same time. Soon we started seeing the benefits of our approach. The API performed wonderfully and development was so much easier and more fun in C# than in PHP. It was just a really lot of work. The API was written in ASP.NET MVC 4, a great MVC framework for developing websites (but also for API's). We just finished the API, so now we can (finally) focus our efforts on the app itself.

The observant amongst you might wonder: but isn't C# from Microsoft and aren't you running Linux on your servers? Well, yes and yes, but you can actually run C# code on non-Microsoft machines. To do this, you have to use Mono. Mono is an open-source implementation of C# and the CLR that can run on a variety of platforms (such as Windows, Mac OS X, Linux). This project is actually quite old (it was originally developed by Novell) and most C#/CLR features are implemented. This allows us to run our ASP.NET MVC 4 API on Linux servers, although this did not go as smoothly as we had hoped. In fact, Mono is also used by the aforementioned Xamarin.iOS and Xamarin.Android products to allow C# code to run on iOS and Android.

A part of the API that took quite some time was how we should handle our authentication/authorization. We could have opted to create our own authentication scheme, but we chose to go with the upcoming de-facto standard OAuth (and more specifically, the 2.0 specification). This standard is already being used by a lot of the big companies like Google, Facebook and Microsoft. Being a standard, it has the big advantage that a lot of the hard work has already been done by others :) So when we had to write our own OAuth 2.0 server (which is the hardest part of the API), we used the DotNetOpenAuth library. Although OAuth is quickly becoming the industry standard, the documentation is still quite lacking. So beware when you start implementing your own OAuth server.

For those of you that are hoping for a public API, we have to disappoint you (at the moment). We first want to be sure that the API works correctly and has proven itself over some time, then we will consider making the API public.

Version control
From the start, we have been using Subversion (SVN) as our version control system. However, over the last couple of months we have been looking into distributed version-control systems (DVCS). We almost immediately fell in love with Git, an absolutely brilliant piece of software. At the time we were looking into Git, we also started working on the iCheckMovies app. Soon we decided that we wanted to use Git as our version-control system for the app and the API. We also looked into hosting for Git (we currently host our own SVN server) and decided on using GitHub, which makes using Git even more awesome.

Marijn is currently working on converting our build system to also work with Git for the code of the website. We expect to be using Git for all our code very soon. Once everything has been moved to GitHub repositories, Marijn will start work on a continuous integration server. This will allow us to update the website far easier, which means that new features and or bug fixes will be online (much) sooner. We will be moving from doing larger updates once in awhile to doing (very) small updates frequently.

Well, that was quite a long blog post! This blog post shows that iCheckMovies has changed quite a bit over the years and that it will be changing even more in the future. We see some really exciting opportunities for iCheckMovies in the future, the app of course being the most obvious one but not the only one! So keep a sharp eye on our blog for updates. Hopefully you have found this technical behind the scenes post interesting, it has been something that we wanted to for quite some time. Happy checking!

9 April, 2013