logo

OKCupid’s MyBestFace Would Be Awesome … If It Worked

logo

OKCupid recently unveiled MyBestFace, a feature that helps its members determine their most attractive profile pictures. You have to earn the report by voting on other members’ photos, and others do the same for yours. Each photo you post requires you to vote on 20 pairs of photos. You have to choose which person you’d rather go on a date with (and skip is not an option). According to OKCupid, this is how it works: “A group of real humans compared your photos with others’, and each time your photo was selected – or not – the information we gleaned was a complex function of how well the opposing photo did in its own report. In other words, we weren’t simply counting votes. We considered all the other votes, too, and converged rapidly on your best face.” Sounds a lot like Mark Zuckerberg’s Facemash to me.

As I voted I realized I was unfairly discriminating against certain users that were not my type. I wished I could specify demographics (at least age range) of people I voted on and people who voted on me. For example, an older woman may choose to go on a date with an older man when pitted against me, just because he’s older, which in turn reduces my score unfairly.

I was very surprised by the results of my report after running my favorite 8 pictures through it, so I decided to process more of them, and more again. Still surprised I decided to run them through a second time to determine the consistency of the results. After all, to trust the results of which photo is better than another, a photo should score higher than another in both round 1 and round 2.

I ran 44 pictures through MyBestFace twice and analyzed the numbers on a spreadsheet (update 6/22/2010 more accurate Excel version). Combining numbers from both rounds, I found that the standard deviation from one photo to another is lower than the average discrepancy between round 1 and round 2. If this is always true, that means you cannot tell which photo is better than another after only one round of comparison. Keep in mind that you have to vote on 20 photos per round per photo submitted, so submitting 44 photos twice required voting on 1,760 pairs of photos, which of course took a lot of time.

My average picture rating of both rounds was 67.36. The average difference between my picture score and 67.36 is 4.3, and the standard deviation over both rounds is 5.38. The average difference between the same photo in round 1 vs. round 2 is 5.46, which is the system’s margin of error.

Since the margin of error is greater than the standard deviation between my good and bad photos, I consider the results very inaccurate in round 1. One could argue that if I compared 2 photos in round 1 and the difference between them was greater than 5 (MyBestFace rounds to whole numbers), the higher scoring picture is indeed more attractive than the lower scoring picture. However, in the worst case scenario I had one picture jump 13 points from round 1 to round 2! And only 13.6% of my photos earned the same score in round 1 and round 2.

It would take rating another 1,760 pairs of photos to determine the reduction in the system’s margin of error after doubling the number of experiments, but I assume it would still be greater than the standard deviation from photo to photo. If that is the case, then even after 4 rounds of experimentation the system still fails to prove which photo is more desirable than another.

MyBestFace is fun to try, and it would be a very useful tool if its results were accurate, but after running this experiment I think I’m better off just asking a few friends which of my pictures are most attractive.

  • Ghee99

    well, it no longer works anyway as  reliable feature anywayi mean, i submitted pics,and i’ll been waiting 11 days nowin the past it took 1 to 2 daysclearly this feature is brokenworks for some, not othersi feel gyped, i did my part (rated others)now i get no results.i would advice all not to use this feature, unless you just want to subject yoourself to the very rel possibility of being tricked into rating others with no results given to you in return (as promised)its too bad, as when it used to work (although hardly accurate, i’d get huge difference in the same pictures from report to report) at least it was fun.

  • http://www.russia-wife.com/ Russian wives

     I have your blog bookmarked to keep up with any fresh posts.

  • Anonymous

    Thank you very much. Noble article. I will post this into my twitter account.

  • RMW

    Ignoring the somewhat mysterious “normalized” scores, a Student's t test, Mann-Whitney U test, or Wilcoxon rank-sum test will show you that your results between rounds were not significantly different.

    • Syrrys

      Yeah the Normalization that OKCupid applies would seem to obscure the usefulness of the posted numeric scores for this kind of analysis wouldn’t it? isn’t that what normalizing is all about, the idea of shifting the available scores to fit a predefined distribution? So really the only relevant take away from the user perspective would the ranking.

      I’ve not explained that well, but the fact that the scores are normalized does eem important.

  • Anon2

    I don't think you should be analyzing the exact scores that the pictures gave. What's much more interesting is the *order* in which the photos come out on the different runs.

    After all, who cares if pic A gets 65.3 and pic B gets 64.1 on the first run, and then both get 5 scores less on the second run. Important is that they come out in the same order, so you get the consistent answer that pic A is better. After all, that's what the intended use of mybestface is.

    And I don't think your first objection to mybestface holds (you said that, for example, older women might favor an older guy over you). They show your picture to a balanced sample of women exactly to avoid this effect. In other words, it is highly unlikely that your pictures got lower scores than others because your picture was shown to more older women.

  • Anon

    The numbers on your spreadsheet are incorrect.

    Deviation from average Round1 Deviation from average Round2
    are actually
    Deviation from average Round2 Deviation from average Round1

    So, you are comparing each datapoint in Round2 with Round1's average. This is a serious bug you should fix. I'd be interested to see your corrected results!

    • http://jesse.la Jesse Wilson

      Thanks for the heads up, you're right! Sometimes I mess things up switching from Excel to Google Docs.

      Here's the Excel version, which should be more accurate in a couple places. Feel free to play with it and repost it, just link back to my article:

      http://jesse.la/files/MyBestFace-test-jesse.la-…

  • Yup

    The proposed service offered isn't an absolute rating, of course. They are offering to determine which of your photos is the best one to use as a profile photo. I'd be interested to learn how robust those conclusions were from your experiment.

    That being said, if the two photo you used are ranked similarly much of the time… you would expect them to switch up every now and then, in which case an average of multiple trails might be necessary. In general though, the application is likely to be able to generate an order of “goodness” of your photographs – which is why I think they have their numbers “normalized.” They give you a ranking of your photos within a single “experiment” relative to each other. You probably have the data necessary to find out if the service they're offering is being realized, but your analysis is beating a straw man.

logo
logo
© Jesse Wilson 2010

Bad Behavior has blocked 123 access attempts in the last 7 days.