Voice recognition, face recognition, text recognition are only some possibilities that are enabled through Machine Learning. It is a technology that can allow our devices to get a better understanding of our world, our actions and intentions of using them, allowing us that way to design and build better context-aware applications.
Computers can do all sorts of incredible things. They are really fast at performing billions of calculations or go through huge amounts of data in mere seconds. One thing they are not really good at yet though is understanding the world we are living in. They are not that good at figuring out what is the occasion we are trying to take a picture of with our smartphones, why we are scrolling this infinite list of posts on some social media website, or what the text is we are typing on some text editor. Every picture we take, every video we watch, or song we listen to is just pure data and means nothing. They do not really help us achieve the goals we are trying to achieve. Instead they wait for us to explicitly tell them exactly what to do for a task to be complete.
Machine Learning is the kind of technology that allows machines to make sense of all this data and extract useful information out of it.
Machine Learning has been used in various different industries for various purposes. We see Machine Learning enabled applications in the fields of medicine, robotics, and even in various art projects. Smart cars use Machine Learning in order to maintain the car on the road, make it understand car signs, pedestrians and get an understanding of its surroundings.
On the other hand, Machine Learning is more relevant than ever for the mobile industry, as you no longer need a PhD in order to get into it. Companies such as IBM, Microsoft and Google, have built their own Machine Learning solutions such as Watson, Machine Learning Studio and TensorFlow that make applying it to your own problems much more approachable.
"This is a picture of a dog"
The way Machine Learning works is similar to how toddlers learn their first language. After a lot of repetition and examples the toddler is able to understand and learn what words like "Mom" or "Dad" sound like and tell different objects such as a chair or a bus apart.
Similarly, if you were to teach a computer how to understand any kind of image of a dog, it won't be able to understand it just by looking at a single image of a dog. Instead we would have to show it thousands and thousands different images of dogs for it to finally get it.
Eventually, the computer will start picking up some patterns on all these pictures and have an idea of what a dog looks like. It might have a fluffy tail, short or long legs, or a black nose. The result of this would be that for any image of a dog that we show to the computer, it will be able to understand that the image contains a dog.
The most important thing to note here is that we would not need to create any set of rules, or write any code, but only provide the data. The more rich and diverse the selection of data we provide, the more accurate the computer becomes at identifying dogs in pictures. 🐶
Learning from various sources
Images are not the only source that a computer can learn from. Images, audio or videos might be the obvious ones to go for, but think about all the different types of information a computer can hold. Any data captured by any sensor such as the microphone, light sensors, accelerometer, gyroscope can be used as well. This can unlock a lot of interesting interactions without having to depend on touching a screen, but create new gestures and interactions instead (such as waving or hand gestures).
Machine Learning in the wild
A lot of applications and services already use Machine Learning to enhance their user experience. Google Photos allows you to search all your 'ice-cream' pictures, without you having to mark any picture as ice-cream. Gmail is 'smart' enough to differentiate spam emails from regular ones. Recommendation services such as Spotify's weekly or YouTube's recommendations are powered by machine learning in order to understand each user’s unique entertainment preferences and provide richer recommendations. The recent introduction of Apple's Animoji uses the technology too. The phone is able to understand facial expressions such as raising your eyebrow or smile and uses that information to animate the emoji character of your choice. Machine Learning could also be used to categorise some members of a population according to their preferences/interests. This is something that Snapchat is using in order to understand which of its users are most likely to interact with a specific type of ad
Apple's Animoji understands the user's facial expressions in order to animate the emoji character
Tools you can use
Even if your team does not specialise in dealing with data you can still use Machine Learning in your applications. Some standard use cases of Machine Learning are already being offered by various companies, one of them being Google's Google Cloud Platform:
What's in this picture?
The Vision API can be used to understand what is visible in a picture. You can get labels of items within it, along with any text there might be in it, or recognise logos, celebrity faces or items from movies. This is ideal for categorising pictures together according to their content, getting information from receipts and credit cards or memos.
The Speech API helps you convert speech (audio). Such an API can enable hands-free interactions much easier than before. Think about applications or use cases where the user might have their hands busy, such as cooking or driving.
What's in this video?
Similar to the Vision API, the Cloud Video Intelligence API can identify objects on each separate shot of a video.
You can find text anywhere you look around you. Articles on the web have text (proof: you are reading this), boards, magazines, signs and labels on the streets, receipts, credit cards, notes-to-self and so on.
The Cloud Natural Language API helps you analyse all this text, and understand what this text talks about and get a sentiment of it. This is particular useful when you want to go through user reviews and quickly understand if they are positive or negative or moderate online conversations.
What language is that?
The Cloud Translation API can be used to translate dynamic text on the spot. Useful for cases where people from different backgrounds might want to interact, such as a social media platform, emails or chat applications.
Enabling new interactions and applications
Machine Learning is a unique enabler that can lead to better experiences and context-aware applications. Think about how you would assist the user if you knew what they were trying to accomplish via your app's features. Does the information the user is browsing give away any hint of their intention? Could you somehow provide alternative ways of managing content if you knew what it was about?
Follow Alex on Twitter