OKCupid’s MyBestFace Would Be Awesome … If It Worked

OKCupid recently unveiled MyBestFace, a feature that helps its members determine their most attractive profile pictures. You have to earn the report by voting on other members’ photos, and others do the same for yours. Each photo you post requires you to vote on 20 pairs of photos. You have to choose which person you’d rather go on a date with (and skip is not an option). According to OKCupid, this is how it works: “A group of real humans compared your photos with others’, and each time your photo was selected – or not – the information we gleaned was a complex function of how well the opposing photo did in its own report. In other words, we weren’t simply counting votes. We considered all the other votes, too, and converged rapidly on your best face.” Sounds a lot like Mark Zuckerberg’s Facemash to me.

As I voted I realized I was unfairly discriminating against certain users that were not my type. I wished I could specify demographics (at least age range) of people I voted on and people who voted on me. For example, an older woman may choose to go on a date with an older man when pitted against me, just because he’s older, which in turn reduces my score unfairly.

I was very surprised by the results of my report after running my favorite 8 pictures through it, so I decided to process more of them, and more again. Still surprised I decided to run them through a second time to determine the consistency of the results. After all, to trust the results of which photo is better than another, a photo should score higher than another in both round 1 and round 2.

I ran 44 pictures through MyBestFace twice and analyzed the numbers on a spreadsheet (update 6/22/2010 more accurate Excel version). Combining numbers from both rounds, I found that the standard deviation from one photo to another is lower than the average discrepancy between round 1 and round 2. If this is always true, that means you cannot tell which photo is better than another after only one round of comparison. Keep in mind that you have to vote on 20 photos per round per photo submitted, so submitting 44 photos twice required voting on 1,760 pairs of photos, which of course took a lot of time.

My average picture rating of both rounds was 67.36. The average difference between my picture score and 67.36 is 4.3, and the standard deviation over both rounds is 5.38. The average difference between the same photo in round 1 vs. round 2 is 5.46, which is the system’s margin of error.

Since the margin of error is greater than the standard deviation between my good and bad photos, I consider the results very inaccurate in round 1. One could argue that if I compared 2 photos in round 1 and the difference between them was greater than 5 (MyBestFace rounds to whole numbers), the higher scoring picture is indeed more attractive than the lower scoring picture. However, in the worst case scenario I had one picture jump 13 points from round 1 to round 2! And only 13.6% of my photos earned the same score in round 1 and round 2.

It would take rating another 1,760 pairs of photos to determine the reduction in the system’s margin of error after doubling the number of experiments, but I assume it would still be greater than the standard deviation from photo to photo. If that is the case, then even after 4 rounds of experimentation the system still fails to prove which photo is more desirable than another.

MyBestFace is fun to try, and it would be a very useful tool if its results were accurate, but after running this experiment I think I’m better off just asking a few friends which of my pictures are most attractive.