
So okay, I have been really rolling up my sleeves, digging into the distributed computing world again. I took a break from the subject briefly as I was hemmed up being a company’s code whore, rabble rabble rabble, chomp chomp pachewy chomp
. “But that’s alright, cause now I’m back so kill all the rumors and straighten the facts”. I broke out of that place and joined an upstart little startup search engine that could‽
. They had the hubris to go up head to head and play with the big G. I was enamored with the place and the idea and the people and as it turned out, I learned a lot about search and building ‘internet scale’ systems. I was turned on to map-reduce and learned what it took, from A-Z, to make a big distributed system serving petabytes of data, and how make it fast and scalable. It was not without it’s trials and tribulations, but that’s where most of the good learning happens, right‽
. I have since parted ways with said company, but the fire that they rekindled in me as a big systems developer is burning bright. I have been immersing myself back into the state of distributed computing reading paper after paper and blog after blog and loving it.
I have decided that I would start my own company. I have a service/product that I want to build that I think will be HOT (read: fun and lucrative). I have been thinking about it for years now. The question at hand is, okay smarty… how do you build it right? At least make the rightest first steps I can make. So, here are the questions that goes through my mind when starting from tabula rasa.
Questions:
What platform? (Linux)
What source code management system to use? (GIT)
What language(s) do I use? (Java and Python)
What web server should I use? (Apache or Lighttpd – not decided)
What build system should I use? (ANT)
How am I going to store the data? (Schema-less dunno which one yet)
How am I going to process the data? (Hadoop MapReduce)
What will I use to build the front end? (HTML/CSS/JavaScript – (JQuery))
What will be the engine for the front end? (Web.py)
How can I keep learning and having fun!? (Don’t work for anyone anymore!)
So, I started with the easy things first. Pretty much 99% of my development over the past 14 years has been on a *NIX platform, primarily LINUX so… I’ll stick with what I know, the good stuff – Plaform = LINUX. That was easy. Okay so as for what scm to use; well I was using Mercurial (Hg) at the last gig which was a refreshing change from CVS that I was using up until that point. I remember seeing a Google presentation about GIT it sounded great so I tried it… and fell madly in love! Regarding what programming language to use; well that was an interesting one. I have been coding in Java since June of 1995, it is my home language and I think it is pretty freakin’ awesome, so then Java is my primary choice of language. But Java, as great as it is at most things, I always found it il-suited for the quick changing, desultory world of web development (IMHO). I was introduced to Python in this context over the past year and a half and have seen it’s merits as a front-end / rendering engine language and platform. So I decided that having the best of both worlds would leave me with Java in the middle and back-end and Python on the front-end, using Thrift as the over the wire glue. Thrift essentially hedges my front-end language bet, being able to wholesale replace the rendering engine language if necessary. Okay so cool
. Next: Hmmm… as for web servers, the jury is still out on that. Apache and Tomcat are great and tried and true, but I have been hearing lots of good stuff about Lighttpd (lighty) especially in the context of Python. As the front-end stuff is my weakest skill set I will wait to see how that shakes out. Moving on; the build system, well, as the child of Java that I am, I am going to have to with ANT. IHMO it is the only way to go. So okay, so much for the easy questions
…. Now the harder ones.
As I have been observing, the word on the net is that schema-less databases are in and the concept and implementations are hitting their stride. Well I am not one to go with fashion, so I took a really good look at the issues and arguments to be made both for and against schema-less solutions. It turns out that there are some strong salient points to be made for using schema-less databases ( Ex: great post from FriendFeed etc…, I can get into that in other posts but for now, trust me
). For me, I have never been a fan of elaborate data schemas nor the copious SQL that went along with them. I always felt that the SQL query language was conflating data storage with data manipulation and forcing you to forecast data decisions and relationships too early in the software life cycle instead of simply capturing and persisting the data, allowing for it to be manipulated as necessary as the evolution of the program dictates; not leave you going through code and mental gymnastics on how to fit a round peg (once square), into a square hole. Besides I feel that schema-less is the better way to go for scalability and indeed simplicity. So that answers how I am going to store my data, but then what would I use? I came across a really nice blog post “Anti-RDBMS” that did a break down of a few front running candidate systems… it got me spinning off digging into these offerings. At the moment the front runner is Project Voldemort but I have not ruled out HBase or Cassandra. As I have been researching these tools I read the Dynamo paper and fell in love with it. It is one of the best papers that I’ve read in a long time. It lead me down a rabbit hole of papers (will list in a future post) that have kept me fascinated and wide eyed and eagerly learning. And pretty much that is where I am… reading and learning and excited!
Thus far, I have proven the path with respect to my nascent project. I can go from the front-end, in Python served up by web.py via lighttpd – talking Thrift – to the middle “business logic” layer – to… a the faux back-end yet to be determined. I can build and deploy it from ANT and it’s in GIT and all is well. Now I just need to settle on a back end and then get back to coding. Oh by the way, I have had to teach myself JavaScript as I have surrendered to the webbies. They win, I guess (I’ll rant on that in another post). These days you can’t do a damn project without them
! So I am having to have to become a bit of a webbie myself. After much investigation I decided to use JQuery as my AJAX toolkit. There is a good blog post from Mike Miles that was good reading and has some good links fleshing out this issue.
Okay, more on things distributed in future posts, I have lots to add and things to share, but I’ll stop here for now. I’ll also try to be more informational and objective so that I don’t sound so opinionated…. I’ll try
.