Yield Thought

it's not as hard as you think
formerly coderoom.wordpress.com

Every few days on Hacker News there’s a popular “Ask HN: Review my weekend project” post. Without fail the apps look polished and complete and are almost always interesting. I used to feel humbled by the awesomeness of someone who turned out an entire project in a weekend.

I shouldn’t have - and neither should you. 48 hours is an eternity. What could you do with 2 hours an evening, every evening for almost a month? Today I’m writing about what I did with that time, spread over the last few weeks. If you want to see where it’s up to now, take a look:

It all started with a simple idea, which steadily grew into an interesting, usable website despite at least ten horribly embarrassing flaws - or perhaps even because of them

Day 1 - The Idea

In Germany there’s an amazing board game culture that I’ve taken to since I moved here. One Monday, a couple of weeks ago, I found myself browsing the reviews on the excellent boardgamegeek.com trying to find a new game that’d be fun to play with my friends and family.

They’ve got this huge database of games and ratings people have given over time and I wanted to do some simple data mining on this to suggest games similar to those I enjoy that I might also want to play. 

I didn’t have any plans to make a product or website at this stage. I was just curious as to how easy and effective it would be. So began a fortnight of messing around in my favourite language - python - of horrible hacks and embarrassing mistakes, yet at the end of it a fun board game recommendation site was live and people were using it.

Here’s how it happened.

Day 2 - Embarrassing Flaw #1: Parsing HTML with string.split

In a spare hour in the evening I pull up a python terminal and start trying to actually get some of the ratings data. I start by scraping the HTML pages with urllib2 (don’t forget to use a browser’s User-Agent) and parsing them with, well, this is embarrassing. With string.split. In my defence I spent maybe 15 mins playing around with BeautifulSoup, but all I needed is a couple of chars and hey, string.split and a regexp works first time.

I know. I’m a terrible person. I should know and love BeautifulSoup already. But, hey, it turns out I don’t. If I’d played around with every piece of technology I should be using whether I felt like it or not, I’d never have got anywhere. Maybe I’d have ended up writing half of an AbstractHTMLScrapingWrapperFactory and then lost my will to live.

Sometimes YAGNI (you ain’t gonna need it) can be applied to your own professional development in the name of doing something cool.

Day 3 - Embarrassing Flaw #2: Using the python prompt as my IDE

The next day I discover there’s a well-hidden XML API for Board Game Geek; just what I need for grabbing some ratings data. This time I’m not just trying to scrape a handful of game ids from static pages, but to parse large amounts of varying data. This is the point of diminishing returns for string.split - ugly hacks are fine, but like all good drugs you have to know when to stop. I pick up the excellent xml.dom.minidom module and use it to sample a few thousand game ratings to play around with.

Now I’ve got some data, I want to see if there are any correlations in the ratings between games - I expect that someone who rates chess highly probably rates games similar to chess highly. I love python because it has everything built in, including numpy. My entire correlation function was:

def correlation(game1, game2, ratings):
    rs = ratings_for(game1, game2, ratings)
    x = numpy.array([a for a,_ in rs])
    y = numpy.array([b for _,b in rs])
    return numpy.corrcoef(x, y)[0][1]

Actually, that top line was added later. You see, I was coding directly into the REPL, the python prompt. If a bunch of commands yielded something useful, a building block I wanted to use again, I stripped them out of the terminal history and pasted it into my (single) source file for future use.

I guess sometimes YAGNI even applies to your editor. An editor forces you to write functions backwards - name and parameters first, contents later. Writing in the REPL was the right way around - do something to get at interesting data and then codify it into a function. I never had to wonder what a function should be called or which arguments it should take.

You wouldn’t catch me doing this in my day job, though. Code completion, find references, these are things I can’t live without when maintaining a big codebase. But writing a little tool from scratch, that I’m exploring as I go? Absolutely.

I run correlations between all the games in my small sample set and see correlations of around r=0.6 for some games - this isn’t huge, but it’s interesting. There’s only so much you can do with a massive matrix of correlations, though. I realize I need a way to visualize this stuff.

Day 4 - Awesome Decision #1: Use Someone Else’s Code

I begin the evening hunting around for a good graphing library that I can actually get running on my MacBook. I spend at least two hours just doing this, going down dead ends, fighting with MacPorts, debating whether graphviz will be good enough or not and so on, until I finally stumble upon the divine NodeBox.

NodeBox has a dot-like graph layout library, which is pretty close to what I need, while integrating nicely with python. The documentation is good enough to get me going and to produce a graph layout of the 30 most popular games in my sample, with edges between those with reasonable correlations, shorter for higher correlations. You can see the trivial python code used to generate this on the right:

This is why writing in a popular scripting language (read: python or ruby) is great: there's an awesome tool for everything

I thought this was pretty great - the games (vaguely) fall into categories that make intuitive sense! I immediately stopped work and showed this to everyone within reach.

Doing this reminded me how much better visualization is than assumption. It’s prettier, too.

Day 5 - Embarrassing Flaw #3: Making a stupid business plan, then ignoring it anyway

While thinking about this on a plane to England, my mind makes the natural leap from my motley collection of python functions to turning down acquisition offers from Amazon and being invited to talk at Davos. While musing on how to get from here to there, I take a liking to a simple little opportunity: write a site that recommends board games to people, then rake in the money through referrals.

Later that night I check out the Amazon referrals scheme and make a quick back-of-the-envelope calculation:

Referral rate for mere mortals: 4%
Average price of a game: $25
Referral income per game: $1
Adwords cost to drive traffic: $0.25
Conversion rate required: 25%

Boom! Shot out of the water. I see no way this could ever become a money maker.

But, I still want to do it. It’d be useful, for me if for nobody else. And hey, I can throw it up on AppEngine for nothing and my friends and I can use it. I gently pack away my dreams of grandeur and decide to make this into something fun.

If I’d spent more time on a business analysis, I’d doubtless have turned up things like alternative traffic aquisition models, viral this and facebook that.  If I did it for long enough maybe I could have convinced myself this was a profitable opportunity.

Instead, I accidentally discovered something more important: I wasn’t doing it for the money. If I had been, I’d probably have gone down a different route or given up by now.

Days 6 and 7 - Embarrassing Flaw #4: Being worse than a one-line algorithm

I spend a couple of evenings lounging on a comfortable sofa in front of the fire in cold and rainy England, writing code to calculate polynomial fits between the ratings for pairs of games. I use the this and their correlation to predict a score for one game given a set of ratings for other games. With each function I’m building myself closer to being able to give my favourite games and get a list of recommendations out.

One dark and stormy night, I reach that point. The final goal. Does it work? I test it against another sample from the DB. It predicts 84% of the games correctly! It works! I’ve done it!

Suddenly, doubt strikes. Lightning didn’t, but it would have been more dramatic if it had. I suddenly ask myself: what’s the baseline here? Anxiously, I write a one-liner that just recommends the most popular game all the time. It scores 88%. Oh.

Days 8 and 9 - Awesome Decision #2: Do nothing

I spend a couple of days not working on the problem, but I do think about it every now and again. Despite the setback, the correlations seen in the graph make me certain there’s some merit to this idea. On the flight back home inspiration strikes and I jot down the basics for an alternative algorithm. It seems less scientific, but it might just work.

In hill-climing terms, this was a clever move. I didn’t realize it was a clever move at the time; I just didn’t feel like writing any more code for a while. If I’d put myself under the pressure of creating a sure-fire money maker, I might have tried optimizing the algorithm I had, tweaking this and that, findally climing to the top of a very little hill in the solution space.

So, +1 from me for giving up when the going gets tough (as long as you come back later when the tough’s not looking).

Day 10 - Embarrassing Flaw #5: Brute force is my algorithm of choice

Back at home again, I code up the new algorithm. It’s not very complicated and I can re-use most of the supporting functions built up already. I try it against the old one and against the baseline.

90% accuracy. Better, but percentage correct isn’t very useful at these levels. It’s predicting one game in ten wrong, which seems like too many to base a purchasing decision on to me.

There are a lot of ‘magic’ parameters I can tweak, but where to start? I briefly consider using a GA or NN (both pet interests) to optimize the algorithm, but frankly I don’t want to learn about some guy’s NN framework - I just want to make my algorithm a bit better. So, slightly curious as to the result, I write a few loops to brute force it. What do you know? It completes before the end of the universe. In fact, before the end of the day.

I’d have loved to have had some cool, cutting edge technology like NN or GA optimizing my recommendation algorithm. It’s so much sexier. But, you know, just trying all the combinations actually worked out great. I guess brute force is its own cool.

Now I have 94% accuracy - the failure rate has almost been halved. But how does it stack up if we start removing people’s favourite games? After all, it’s easy to recommend a game everybody loves, but I want to recommend games people don’t already have! I code it up and try it out.

94%, 95%, 92%, 94%… it’s looking good even when it starts recommending oddball games way down the top 200 list, although it does drop off eventually. I try it on myself. It recommends games I already wanted to try out. I try it on a few people I know. It seems reasonable. I’m excited! I want to show it to my friends and family and let them try it, but I can’t because it’s just a python script.

Time to change that.

Day 11 - Embarrassing Flaw #6: My website is a popular free CSS template with some badly-written JS on top

Two things I’ve learned about writing little web sites:

  1. I have no graphic design skills
  2. A site with a cute template is a lot more fun and rewarding to work on than a blank page with input boxes

With those in mind, the first thing I did was hit up the free CSS template sites to find something not too hideous that I could use as a basic template. I knew I wanted a logo, a couple of input boxes and space for a nice list of game boxes. Almost immediately I found one:

I tweaked it a little to suit my needs, yet I was still ashamed, knowing that anyone who’s seen this template before (and it’s one of the more popular ones, apparently) will instantly know that I have no CSS skills and am too lazy to subcontract to a proper designer.

I remind myself that pride is another thing I Ain’t Gonna Need and get on with it.

As an aside, I’d recommend Google AppEngine in a heartbeat to anyone who already knows some python and wants to host a little side project:

  1. Great basic API - you can go from zero to working page serving DB requests in half an hour
  2. It just works - creating a new app is easy, deployment is single-click
  3. Generous free limits - I’ve never run past the quota yet
  4. No performance worries under load - unless you do something stupid to your DB, I guess

Day 12 - Embarrassing Flaw #7: It takes me all evening to get autocomplete working, and it still doesn’t work properly

Now I have a site that looked ‘real’, but I wanted it to feel right. I’ve two text boxes for game entry and I really need auto-complete on the game names. I mix in JQuery and JQuery UI for the autocomplete and instant search - neither of which I know much about. In fact, I’m such a n00b that after spending all evening reading the JQueryUI docs and messing around with select vs close events my autocomplete popup and form event handler still don’t work properly - pressing Enter too soon will result in your game being silently ignored.  But, hey, it’s close enough.

In the end, spending the time on the behaviour and the looks were worth it. Behaviour is fundamental - do you choose single items from a list, or auto-complete? Is it easy? Is it fun? These choices are somehow the core of a user experience.

Even if I change the visuals for a proper design someday, the user experience will stay pretty much the same - and I wouldn’t have figured that out without my free CSS template and hard-won-yet-buggy autocomplete implementation.

Day 13 - Embarrassing Flaw #8: My NoDB implementation

The rest of the site was really simple. Too simple, in fact. See, my python script didn’t have a DB, it just used pickle to load the data from a file. So my Google AppEngine project didn’t have a DB either. Take that, NoSQL guys! I’m NoEverything!

This is really shameful. It would be measurably better to use the DB. There’s a delay of several seconds whenever someone goes to the page for the first time since Google decided to kill off my instance - luckily the instances live a long when the site’s under load. It stops me using a larger sample set - even though this would give better results - because doing so forces the process to exceed the soft memory limit, meaning it gets killed at the end of each request.

But, you know, if I’m going to put the DB in, I should get the GAE app to fetch the data too, rather than doing it in batches by hand. And I should get the data in better formats to solve X and Y and Z. And at this point, I could easily spend several evenings making it work right. Call it a week.

Yet even without all that, it’s actually pretty good.

I’ll change it soon, sure, but in that moment of analysis paralysis I decide to leave it as it is and hope that nobody ever finds out.

Day 14 - Awesome Decision #3: Begging for feedback

At this point I have a single web page that:

  1. Invites you to enter one loved and one hated game into auto-completing boxes
  2. Instantly updates to show you a list of recommended titles, along with game box images

It doesn’t do anything else. I have so many big plans - the classic social media buttons, loving / hating multiple games, saving the state in the URL, creating sharable links, showing mini reviews of each game…

I don’t do any of those. Hundreds of blog posts telling me “If you’re not embarrassed by version 1, you didn’t release early enough” have made their mark on my soul. Instead of making it better today, I post this to the Board Game Geek recommendations forum: 

Hi all,

I always come to BGG to find a cool new board game to play, 
but I end up spending hours reading reviews in forums and 
trying to guess at what would suit me, my wife and my friends 
best. So when I saw the XML API, I just had to write this:

http://findanewgame.appspot.com/

Tell it which games you love and one you hate, and it shows 
you a selection of games you'll probably love. It certainly 
works for me, and after testing it on a ~2000 user sample of
the BGG database the algorithm proved to be 94% accurate at
predicting which games a user would rate highly, which I
found pretty stunning.

It's just a bit of fun at the moment, but I'd love to hear
how useful it is / isn't and which other features you'd like
to see.

Enjoy!
Mark

Immdiately I begin refreshing the page compulsively, waiting for someone to reply.

They do.

It turns out that getting in touch with a community for my app was easy. This makes me think that if it’s not easy, then maybe you’re developing an app for an imaginary group of people who don’t exist.

Days 14-19 - Embarrassing Flaw #9: Attention addiction

That week the main page was hit 1500 times and the post racked up over 100 replies containing bug reports, suggestions and feature requests. The game was on! Every evening and lunch break I added a little feature here, tweaked the javascript there. The hardest bit was not working on it the rest of the time!

Every piece of feedback, every happy post fed a warm glow of pride and smug self-satisfaction. A little too much, frankly. Still, it made being responsive to people very easy, and that seemed to make people even happier. By the end of the week I completed the last major feature addition people had been asking for. Now the site was - as far as I could tell - pretty useful.

Next time, I won’t worry at all about launching a buggy website. People tell you about bugs and this gives you all the motivation you need to get in there and fix them! And if a site has any value at all they’ll try it a second time, so you can’t actually lose. Unless you’re launching a bank, I guess. Or a nuclear reactor. I promise not to do that.

Days 20-27 - Embarrassing Flaw #10: I burned out

By this point I’d been spending most of my spare time working on or thinking about this little site for the last two weeks, and suddenly I found didn’t want to do that any more.

The forum post slipped off the main page, the stream of visitors slowed to a trickle (mostly through email referrals and direct hits - presumably some word of mouth).

I decided I’d had enough. I took a break. I went for walks and spent time with my friends and family. I met interesting people and did interesting things.

I guess if I’d thought I was in a race to release, if endless riches were waiting just out of my grasp, I’d have forced myself through this phase. Or if I’d been burning VC money, my runway rapidly disappearing, the date of failure already marked on the calendar…

But I wasn’t in either of those situations. I just let it rest, knowing I’d come back when it was time. Still, the thought of just leaving it there without somehow pushing more, doing more, that’s was embarrassing. But necessary.

Here I Am

After around a couple of weeks completely ignoring the site, it’s still getting hits from somewhere - mostly email referrals as far as I can see in the logs:

My goal is to put a basic, yet fully-functional app out there and see at least *one* end-to-end conversion. That plan looks like this:

  • Add a short paragraph at the top of the page explaining why you would use the site and how to get started - it’s not obvious enough as it is
  • Show more information about each game on click, including a ‘buy now’ link to amazon and a description / mini review
  • Add conversion tracking

There’s also a ton of things I really, really should do, but won’t (yet), like:

  • Add a ‘share this list’ section, offering tweeting a link, sending to your facebook wall or emailing it to a friend
  • Internationalization - especially the German market (board games are huge in Germany). This needs translations of the game titles and redirection to amazon.de instead of amazon.com. I’m sure you can do this with their API.

Once I’ve got a minimal yet complete site up, I’ll find out whether it’s something worth spending more time on. Either way, it’s been fun! And without really realizing it I too have a done a ‘weekend’ project I could submit to HN, just like all the cool guys do ;-)

Update: this article is front-paging both Hacker News and Proggit, so I’ve added some extra text at the top and linked some (but not all) items to an amazon.com search page. I guess I’ll get back to plan A and proper referral links when the dust settles a bit!

Two Things I Learned On The Way

So far, I’ve learned two things from this experience. The first is that although I created something filled with consisting entirely of embarrassing flaws, without taking those shortcuts and without making those mistakes I would never have come as far as I did.

Just because your code is horrible doesn’t mean it’s too early to show people the end result - they won’t care what your code looks like.

The second is how easy this sort of thing has become. I did this over a few weeks without any special skills and with a full-time job, a social life and a family. Most of the time was just exploring a fun side-project; putting it online was the least time-consuming part.

I Did This; You Should Too

How many half-finished projects litter your hard drive, never to be seen by another living person? What a waste! These days it’s so quick to get a half-finished project online so that other people can tell you whether it’s useful or not that it’s almost a crime not to give them the chance.

Take your next project (or even better: one of your old ones) and just drag it, kicking and screaming if necessary, onto a server. Send me a link when you get something horribly embarrassing live for the first time; we can laugh about it together and maybe - just maybe - it’ll become something amazing.

HNer? Check out the Hacker News discussion of this post. Or do you prefer Reddit? Either way, follow me on twitter and tell me about your next project…

Postscript: You can try the site for yourself here: http://www.findanewgame.com - have fun!

A serious point underlies the flippancy in Single- vs Co-Founder: It’s Like Star Wars - we use the word ‘startup’ to refer to a wide range of different businesses, yet treat them as if they were basically the same thing. Advice for one doesn’t necessarily apply to the other, so we should ask: what do we mean when we talk about a startup?

Not so long ago, the definition of a startup was very literal:

A company that is in the first stage of its operations

During the dot-com booms, easily-accessible VC funding made the term virtually synonymous with the Silicon Valley, high-investment model. Until recently I’d always heard of it in this context, but I’ve never seen it clearly defined in these terms. Today there are a wide range of businesses with completely different operating models being referred to as startups; just take a look at these:

  1. Bingo Card Creator - Patrick is the poster-boy for single-founder, organically-grown freedom-based software businesses. He started BCC as a side-project and has grown it to a small business empire that he runs as sole founder and owner. It’s not clear how it could have benefitted from VC funding or a co-founder. Is BCC a startup?
  2. DuckDuckGo - Gabriel Weinberg is himself an angel investor, so does the plucky alternative search engine count as having taken investment already? As far as I’m aware, Gabriel works on DDG alone, but I suspect he wouldn’t rule out taking VC funding should its growth explode. Is DDG a startup?
  3. Apple - Woz and Jobs didn’t take any external pre-IPO funding, yet their garage-born disruptive technology business is one of the classic startup stories. Would Apple be a startup in today’s terms?
  4. Google - The archetypical Silicon Valley story. Two people with great ideas and technology. $100k angel investment before they even incorporated. $25m from Sequoia and Kleiner Perkins. $1.67b IPO. Most famous startup of our time.

How can we talk about what a ‘startup’ needs without differentiating between these kinds of companies? Should we talk about ‘funded startups’ and ‘organic startups’? Are there two (or more) discrete categories, or just points along a continuum?

Clearly we can imagine a continuum along the scale of funding / growth. Personally, I suspect profitability is greatest at the ends with a big dip of death in the middle. Have you ever heard of a company who said:

Now we’ve secured our series A round we’re planning to grow a bit faster than we can afford to, but not exponentially. Just somewhere in the middle.

Maybe I’m wrong and there are successful strategies like this, but it’s non-obvious. If there are discrete categories, what defines them?

  • Growth vs. Revenue - are they using external funding to grow faster than their natural rate?
  • Potential for Disruption - are they trying to redefine a market, or just find a profitable niche?
  • Exit Strategy - IPO or profitability with dividends?

It’s interesting to see how historical cases are divided by these classifications; as a startup or whichever kind it’s clearly going to be more helpful to follow the example of companies who match your profile, but which are the classifications that count?

I don’t have any answers, I just have questions for you and everyone else in the community:

  1. What sort of company do you mean when you say ‘startup’?
  2. Is there a continuous scale between Bingo Card Creator and Google, or are they discrete types of startup?
  3. How would you classify startups into groups, such that advice for one startup is likely to apply to others in its group? Is this even possible?

Answering these, even just trying to, will make future discussions more transparent and better-defined. How do you define and categorize startups?

Does a startup benefit from two or three co-founders? Can single founders compete? Every so often this topic bubbles up into my consciousness rss feed. Lots of single founders say “It works fine for me” and lots of co-founders say “We probably wouldn’t have made it alone”.

These discussions are often had completely at cross-purposes: people persist in talking about very different kinds of companies. Look, it’s like Star Wars:

It’s Like Star Wars

Senator Palpatine has a dream. He’s seen this neat hack he can use to disrupt the established order and build a new empire, in which all the money (and power) goes to him. He needs to move fast, grow exponentially, work continuously. He needs a co-founder to help him in this big, high-stakes gamble against the world, so he goes out and gets one.

Han Solo also has a dream; the dream of personal freedom, of being his own man. He couldn’t be less interested in changing the world. He doesn’t want any part of the rebel alliance’s plans. He’s small fry, but he’s got what he wants - the ship he’s so proud of, his crew and the freedom to go and do what he chooses, beholden to no-one [1] - a lifestyle business.

See What I Did There?

The classic venture-funded startup is a high-pressure, all-or-nothing gamble against the world; in this environment it’s easy to see why the advantages of co-founders matter so much - doubly so from PG’s perspective as an investor in such companies. Anyone investing in Palpatine [2] without Anakin would have lost their money (as it is, all the smart investors cashed out long before Return of the Jedi, right?)

Most single-founders I’ve spoken to personally or read regularly are actually building lifestyle businesses. They work for their freedom, their independence, the ability to choose their own destinies. Sure, they’d love to make it big, but they’re happy to take it slow. How would you find a good co-founder for a lifestyle business? You’ve got two different lives to lead and the whole point of the business is getting to do it your way. In these circumstances, a single founder with employees or contractors makes a lot of sense. They’re Han Solo, in their very own Millenium Falcon, and they like it.

So next time you read, write or comment on an article comparing the merits of single and co-founders; next time you feel your personal choice undermined by someone else’s argument, stop and ask yourself: is this about forging an empire, or being master of your own destiny? They’re not the same thing. [3]



[1] Well, except his debtors, although that’s nothing you can’t cure by shooting first. And yes, yes, Han-Solo had his long-time friend as first-mate, but he was already on his entreprenurial path before they got together and he’s clearly calling the shots. Don’t go all Chewbacca Defense on me here

[2] I don’t want this negative image to distract - I’m not trying to compare Silicon Valley to the Sith Lords here, it’s just for illustration. Although now I think about it, they do have my data by the throat.

[3] This is really true. Palpatine has power and resources Han Solo can only dream of, but Han Solo has freedoms that the emperor can never have. Great responsibility and all that. Also, Han Solo doesn’t get stabbed in the back by an ungrateful co-founder after they make it big, but this is getting off-point real fast…

A while ago Jason Sage suggested the wisdom and beauty of this poem apply quite naturally to programming. He was very, very right - so with apologies to the unknown original author:

3ccl35145735: 3

1 There is a time for all things; 
    a season for every activity under heaven:

2 a time to hack it together and a time to refactor the mess;
    a time to be clever and a time to be featured on thedailywtf,

3 a time to throw descriptive exceptions and a time to return false;
    a time to release early and a time to refrain from releasing at all,

4 a time to think about the problem and a time to ask stackoverflow
    a time to copy from a random forum post and a time to wish you hadn’t,

5 a time for getting on with work and a time for reddit
    and for hacker news and dilbert and xkcd and penny arcade,

6 a time to GET and a time to POST; 
    a time to sanitize the database inputs and a time to sanitize the database inputs properly,

7 a time to make the method public and a time to make it private again; 
    a time for dynamic typing and a time for runtime errors on the production server,

8 a time for the strategy pattern and a time for switch {}; 
    a time to argue about it and a time to regret not arguing enough,

9 What does the programmer gain from his toil? I have seen the burden God has laid on men.

10 He has made everything beautiful in its time,

10a Except C++

10b And PHP

10c Don’t get me started on actionscript, OK?

10d Fucking actionscript

11 He has set eternity in the hearts of men, yet they cannot fathom what He has done from beginning to end,

11a nor indeed what they themselves were doing yesterday when they tried to implement memoized tail-recursion using templates.

12 Everything God does will endure forever, but with any luck my code will be deprecated in the next release.

12a Or at the very latest the one after that.

12b Oh please God don’t make me maintain this for the next twenty years.