The Rise of User-generated Data Labeling - Blog

Cheetah uses supervised learning techniques to catch its prey. That’s a bizarre, random out-of-the-blue statement you may say. But, think about it. A cheetah has adapted a very refined approach to hunting by honing its skills through practice, observation, experience, and computation.

Much like training datasets to create a spectacular AI model. They’re trained and taught continuously until they’re able to operate on their own. The marvelous cheetah species too goes through a similar process until it can anticipate the escape tactics of various prey and modulate its speed for rapid turns – and not just rely on its agility and speed. Cognition is achieved through immense training and the core of this process is Data Labeling.

This is an essential prerequisite that helps your machine learning algorithms to “learn” based on the labeled input. Now, there are several ways to do this – self-managed human labor, outsource to individuals/companies, third-party managed labeling providers, and more.

But, let’s say your project is humongous and needs data labeling to be done continuously – while you’re on-the-go, sleeping, or eating. That’s when you need to get it done for free. Of course, it can be outsourced, but if you consider the cost, probabilities covered, and accuracy achieved, I’m sure you’d appreciate user-generated Data Labeling.

I’ve got 6 interesting examples to help you understand this, let’s dive right in!

1. Netflix annotates thumbnail images, did you know?

A simple application of data science on platforms like Netflix would, of course, be how their recommendation engines work with implicit data. Let’s say a user “A” binge-watched a show, say, “Jane the Virgin” (all seasons in 4 days), the implicit data is that you liked the show because you obviously sacrificed a lot of sleep to watch it. Behavioral data combined with thousands of other data points is the basis on which the machine learning algorithm at Netflix actually works.

Todd Yellin, Netflix’s vice president of product innovation says, they consider data on “What we see from those profiles is the following kinds of data – what people watch, what they watch after, what they watch before, what they watched a year ago, what they’ve watched recently and what time of day”.

So, if you watched Jane the Virgin, Netflix’s ML algorithm is likely to consider what people who watched this show watched after, analyzes the trends in the community preferences, if they like strong female leads or if they appreciate comedy, corrupted cops, mysterious murders, and more.

Now that Netflix classifies and recommends shows and movies based on similar interests, it further goes a step higher (to improve the click-through rates) with a concept called personalization of thumbnails. These are basically images annotated by Netflix from the video frames in a movie or a show.

Image Source Credit: Becominghuman.ai

So where is user-generated data labeling here? Exactly where Netflix collects different thumbnail images and annotates them based on the user’s past behavior, preference towards a particular genre, filters and lighting, favorite stars, and more. This recommendation is unique to every user which is based on thousands of similar interests that have helped improve the click-through rates. It’s brilliant how Netflix sneaks into collecting data from users and effectively utilizing it to improve the experience.

Bravo, Netflix!