You’ve all heard about Artificial Intelligence (AI) but only few of us know exactly what it means and how does it impact our everyday life.
When thinking about AI, many Baby Boomers and X Gens think of the old sci-fi films and the scenes where machines come alive and take over the world. But that’s just a funny representation how humans used to perceive the unknown.
If you remember the old TV show ‘Beyond 2000’, you may recall that their ideas and inventions were outstanding at the time, which only shows the potential of the technology.
What is in fact AI and what are the examples we can see on mobile?
Artificial Intelligence (AI) is present in mobile phones for some time now, but in the prior generation of phones, it was cloud-based and required Internet to be accessed. The difference with mobile AI today is that the new generation of smartphones integrate the cloud-based AI with built-in AI on the hardware – this innovation was announced by tech giants such as Google, Apple and Huawei.
The rate at which AI is expanding is accelerating. As per McKinsey Global Institute study, AI expansion brought nearly $40 billion investment back in 2016 – sectors like healthcare, education, and finance are all investing in AI, but mobile is the most promising area for AI.
Built-in AI hardware
AI’s been dominant in app development for several years already and has a potential to grow much more in the coming years. Devices are now offering a number of features to build up AI performance – combining AI with these built-in elements makes apps more relevant and personalized.
Some of the examples are Apple’s iPhone XS (pronounced Ten-s), XR and iPhone XS Max (S-Max) which power various advanced features including Face ID, Animoji and augmented reality apps. Immediate follower is Google’s Pixel 3 XL which is said to have the best camera phone according to TechRadar. You can blur the background with a single camera called dense dual-pixel autofocus – using the depth map, the Portrait Mode software replaces each pixel in the image with enticing blurry background known as bokeh. The result is a high quality image that matches the professional quality with just a quick tap.
The third big player Huawei released Huawei Mate 20, Huawei Mate 20 Pro and Mate 20 X. Mate 20 and Mate 20 Pro are both powered by Huawei’s newest in-house processor the Kirin 980 chipset and have triple rear cameras – the phones’ AI chip offers a number of features, including ‘4D predictive focus’ (tracking the main object in the photo so to keep in focus) and more. Apart from those two, Huawei Mate 20 X is intended mostly for gaming audience. Its large screen can display more information thus reducing amount of scrolling.
All three brands also paid attention to a better battery performance on the new generation phones which is partly due to the in-device AI.
Use of AI in Mobile Software
TensorFlow was created to be a reliable deep learning (DL) solution for mobile platforms.There are two solutions for deploying machine learning (ML) applications on mobile and embedded devices: TensorFlow for Mobile and TensorFlow Lite.
TensorFlow for Mobile has a fuller set of supported functionalities and you should use it to cover production cases while TensorFlow Lite allows targeting accelerators through the Neural Networks API.
Some common use cases for on-device deep learning:
Speech Recognition (small neural network running on-device listening out for a particular keyword and transmitting the conversation to the server for further processing);
Image Recognition (helps the camera to apply appropriate filters, label photos to be easily findable, uses image sensors to detect all sorts of interesting conditions);
Object Localization (augmented reality use cases, TensorFlow offers pre-trained model along with tracking code – the tracking is important for apps where you’re trying to count how many objects are present over time – it gives you a good idea when a new object enters or leaves the scene);
Gesture Recognition (effective way of deploying apps with hand or other gestures, either recognized from images or through analyzing accelerometer sensor data);
Optical Character Recognition OCR (Google Translate’s live camera view is a great example – the simplest way is to segment the line of text into individual letters, and then apply a simple neural network to the bounding box of each);
Translation (these are often sequence-to-sequence recurrent models where you’re able to run a single graph to do the whole translation, without needing to run separate parsing stages);
Text classification (if you want to suggest relevant prompts to users based on their previous readings, you need to understand the meaning of the text and this is where text classification comes in. Text classification is an umbrella term that covers everything from sentiment analysis to topic discovery, example like Skip-Thoughts)
Voice Synthesis (a synthesized voice can be a great way of giving users feedback or helping accessibility, and recent advances such as WaveNet show that deep learning can offer very natural-sounding speech).
- Image Recognition Features
The technology of facial recognition is nothing new but it’s expected to witness the new growth opportunities in the coming years.
Mobile app creators considered the growing interest and try out new ways to apply the technology in an unconventional way since camera phones became a focal point for communication. Set of techniques that serves as a groundwork for such applications are ego-motion estimation, enhancement, feature extraction, perspective correction, object detection and document retrieval.
Since retail giants such as Amazon, Target and Macy offer image recognition with their mobile apps, the technology will likely become a must-have. Scan-to-buy options enable customers to shop directly from a retailer’s catalogue and in-store signage increased in demand and became a standard offer today.
Some retailers are employing image recognition that allows consumers to point their phone at any object and receive suggestions for the similar products. Direct example of this is IKEA Place app which they developed for iOS – the users can place the IKEA furniture into their homes with the help of AR and rotate around as if in realistic world.
Mobile visual search is a great potential to create the new profit opportunities – brands are trying to utilize the smartphone camera’s increasing sophistication so to activate consumers and drive sales. In some cases, visual search is faster and more accurate than text or voice and smartphone is the perfect launchpad for the visual search technology.
Leading Internet search companies such as Google and Baidu are racing to capture mobile visual search market as it begins to replace traditional forms of search.
Let’s say you saw something you really liked but you don’t know how to find it or how it’s called – visual search lets you find all those things you don’t have the words to describe. Google Lens is a perfect example – in 2017 Google Lens was introduced in Google Photos and the Assistant. As of 2018, Google announced three major updates: first, smart text selection that connects the words you see with the answers and actions you need – you can copy/paste text from the real world (recipes, etc.) to your phone.
Second update is a style match, e.g. if you like a specific outfit you can open Lens and see things in a similar style that fit the look you like.
Third update is that Lens now works in real time – it allows you to browse the world around you just by pointing your camera.
With a snap of camera, companies can use technology as a tool to determine the elements of their inventory, publishers can use it to source quality visual content from their photo libraries and Digital Asset Management (DAM) software can include visual search to organize and curate their customers’ content – visually.
Visual Search can help businesses in E-commerce to increase catalogue discovery, customer engagement and conversion rates.
Image recognition APIs train computers to analyze, classify and alter different types of pictures.
Let’s list some of them:
Clarifai independent team built system that accurately recognize most entities. Unlike any other APIs on the list, it’s offered scene recognition with a bonus of video analysis. For images, Clarifai can perform sentiment analysis, text recognition, logo and face detection, as well as more robust version of Resemble’s image attribute detection: brightness, colour and a dominant colour.
Cloud Vision by Google enables developers to understand the content of an image by covering ML models – it includes many of Clarifai’s key features and some add ons like: landmark detection and a simple REST API. You can’t make your own models to test against but you have the access to an API backed by Google which is constantly improved. Furthermore, you can build metadata on your image catalog, easily detect broad sets of objects in your images and moderate offensive content from your crowd-sourced images which is powered by Google SafeSearch. Optical Character Recognition (OCR) allows you to detect text within your images as well as automatic language identification.
On the other hand, Amazon Rekognition prides itself with a more robust suite of facial analysis tools, including facial recognition (not offered by Google or Clarifai) across images, and detailed information like beard recognition (yes/no), and facial comparison (how likely is it that two faces are the same person?). It also pledges integration with AWS services (S3 and Lamba).
It would be suggested that Clarifai has the strongest concept modeling, Google the best scene detection and sentiment analysis, and Amazon the best facial analysis.
We still have The IBM Watson™ Visual Recognition service which uses DL algorithms to analyze images for scenes, objects, faces and other content. You can make and train your custom image classifiers using your own image collections – use cases include manufacturing, visual auditing, insurance, social listening, social commerce, retail and e-commerce. As visual recognition understands visual data, it can turn piles of images into organized information. With the IBM Watson Visual Recognition service, building mobile apps that can accurately detect and analyze objects in images is easier than ever.
Let’s stop here for now – there are more features to talk about in the second part of the article and we will see more examples of how AI redefines a mobile software and a mobile experience altogether.
AI and the powerful impact on mobile technology (part 2)
Last time I wrote about the usage of AI in mobile software and we covered TensorFlow services, image recognition features and APIs and visual search on mobile. But there are many more benefits where AI improves the mobile experience.
Let’s move on to other features you can benefit from when AI starts to incorporate its magic into the mobile.
- Natural Language Processing Features and Understanding
Your first contact with Natural Language Processing (NLP) might involved a GPS navigation app which allows you to verbally request directions to a destination – they are far more sophisticated than they used to be.
The best-known mobile app with NLP is SIRI, a virtual assistant (VA) technology followed by other VAs including Alexa, Cortana and Google Assistant.
NLP became more common in the medical and healthcare sector as the use here is wide. This is especially true when it comes to the apps for wearable health apps that allow you to use verbal input as this field has an increased need for hands-free communication.
Another usage is within detecting spam messages where NLP can be extremely useful. Spam filtering algorithms ‘read’ the content of blog comments, social media or email messages etc. Then, they compare it to the known spam messages and text patterns to identify the spam.
Also, there’s a huge potential in creating and pulling the data from the information stores – a user can give verbal input to search plethora of ebooks, websites, videos, footages etc.
In addition, NLP can have a remarkable usage in the area of prediction – referring to political and social events.
There will be improvements within language translation apps and mobile apps that include talk-to-type functionalities.
Natural Language Understanding (NLU) handles machine ‘reading comprehension’.
It converts text pieces into more formal illustrations such as first-order logic structures that are easier for a computer program to manipulate.
NLU identifies the intended semantic from the multiple possible semantics which can be extracted from a NL expression and which usually takes the form of organized notations of NL concepts.
Regardless of the approach used, most NLU systems share certain common elements – the system needs lexicon of the language, a parser and the grammar rules to break sentences into an internal representation. The umbrella term ‘NLU’ can be applied to a different set of computer applications – from simple tasks like short commands issued to robots to highly complex ones such as full comprehension of the newspaper articles.
- Text-To-Speech (TTS) Systems
TTS is a high fidelity speech synthesis which gives better user experience for some specific groups like people with learning disabilities, literacy difficulties, people who speak language but cannot read it, people with visual impairment and different learning styles, people who multitask or that access content from mobile phones.
Making your digital content audible helps online population to understand the text better and as people are increasingly going mobile, TTS can turn any digital content into a multimedia experience so that people can listen to blogs, articles or news.
Some of the best text-to-speech softwares of 2018 are:
- Amazon Polly – Besides Alexa, Amazon created an intelligent TTS system called Polly. Polly turns text into lifelike speech. It supports an API that lets you easily incorporate speech synthesis capabilities into ebooks, articles and other media. It is easy to use – you just need to send the text through the API and it’ll send an audio stream straight back to your app.
- Voice Reader Home 15 – Linguatec created Voice Reader that can quickly convert text (Word docs, emails, EPUBs and PDFs) into audio files. You can listen to those files on a PC or a mobile device.
- Capti Voice – Speech synthesis apps are popular in education world as they improve comprehension among other things. Capti Voice lets you to listen to anything you want to read. You can customize learning or teaching as well as overcome language barriers.
The new cutting-edge TTS service launched by Google is Cloud Text-to-Speech powered by WaveNet, a software created by DeepMind AI.
It analyzes the waveforms from a vast database of human speech and re-creates them at a rate of 24,000 samples per second. The final result includes voice with subtleties like a lip smack and the accents. Google advises the new service provides 32 different voices capable of speaking 12 languages and users are able to customize factors like pitch and speed.
- Speech-To-Text (STT) Systems
If you’re at the conference or a lecture, it can be quite hard to write down every word the speaker says and this is where speech recognition comes in to solve the problem.
As it is dependant on computational linguistics, it identifies spoken language and turns it into text.
These systems can differ in capabilities where simple ones can recognize only a selection of words while the most advanced ones can understand the natural speech.
Some of the best STT apps are:
- Evernote for Android – Evernote allows you to record audio notes and turn those into the text. Unlike Dragon Dictation (see below), Evernote saves both the audio and the text file together so you can record what’s on your mind and sort the data later. The app is free, but since it uses Google Android text transcription service, it requires Internet connection.
- Dragon Dictation – this app has only one button – just tap it and start talking and Dragon Dictation will take care of the rest. The text shows after you’re done with dictating and once the app finished transcribing your speech, you can send it out via email, or copy and paste to another app. You can also post directly to Facebook or Twitter or just save your text and use it later on. The app is for free for iPhone and iPad but it requires Internet connection.
- Voice Assistant – this redesigned app has a fast access feature that makes it easy to post on Twitter, Facebook or email. With Voice Assistant, you can utilize auto copy feature to send your recordings to other apps such as Google Search, YouTube etc or straight to a wireless printer. It also has grammar correction and on-screen editing with suggestion for corrections.
- Transcribe – this is a popular dictation app that’s powered by AI where you can import files from Dropbox. Transcribe any video or voice memo automatically, supporting 80 languages from across the world. Once the file is transcribed, you can export raw text to a word processor to edit. The app is free to download yet you’ll have to make an in-app purchase if you want most of these features.
- Speechnotes – Speechnotes doesn’t require to create an account – just open the app and press on the microphone icon and you’re ready to go. When recording a note, you can easily dictate punctuation marks through voice commands. You can quickly add names, signatures, greetings, etc. by using custom keys on the built-in keyboard. Speechnotes app allows you to access plenty of fonts and text sizes – the app is free to download from Google Play Store but you have to make in-app purchase to access all features.
Chat Bots for mobile apps are classified as ‘recent’ sensation. But their beginning and development started in 1966 with Eliza – a medical chatbot which can be considered as the mother of all chatbots. Chatbots are great for specific tasks, from simple ones, such as rule-based chatbot that answers basic customer questions, to the complex ones like helping customer service questions.
Chatbots won’t replace websites or apps but they work great when integrated with the same apps and websites to boost interaction with customers.
For companies, it is essential to engage with their customers on a regular basis – mobile apps are the best platform for this engagement. Today, almost everybody would rather communicate with a company through their app than through email.
Also, chatbots can assist with privacy issues – that is the reason why many banks are building their own Chatbot platform like Erica from Bank of America. Using native chatbot helps to avoid privacy issues.
Some great examples of chatbots are: Duolingo, Erika by Bank of America (still in beta stage), Lemonade Maya (replacing brokers and bureaucracy), Operator by Intercom (customer service chatbot that handles simple tasks). I would also like to mention Messenger platform for chatbots (Facebook) which currently dominates the Web.
Benefits of bots in mobile are massive – customer interaction will be more lively and engaging, you won’t need to download an app for a task, chatbots will be your calculator, booking agent etc., they will recommend new things to you, help you with repetitive tasks and will save you a lot of space on your phone as they will be a number of apps in one.
Developers will see the benefits through seamless deployment of the chatbots for messaging or other instances, integration of chatbot with other apps – with an intelligent chatbot you can have easy-to-see features and additional functionality added to your mobile app.
Chatbots are adding quality to your mobile app esp with the intelligence support it gets from AI since it will help you increase conversions.
Chatbots are the future of mobile technology, so are you ready?
How Will Mobile AI Impact Businesses?
The major tech companies are incorporating AI algorithms into various devices to strategically retain users – it helps business to deeply engage users and provide more incentives to use their services.
Many devices and apps will be written with algorithms that adapt based on the learned behavior – the algorithms will be able to filter the data, find trends and adjust the apps themselves to create more meaningful opportunities for engaging the users. Forward-thinking enterprises will prosper on the advantages AI provides, as it continues to connect users to brands.
The new advancement of AI and ML is causing a revolution in the way that developers, businesses, and users think about intelligent interactions within the mobile applications.
What will happen next…
The most obvious changes AI will bring are processing speed and efficiency — doing things faster and without multiple charges of your phone. In the end, the whole point of AI is to create more personalized and user-friendly relationship with our smartphone.
Google’s $400 million acquisition of DeepMind is a prime example of mainstream AI application. A study conducted by the Mckinsey Global Institute revealed that tech giants such as Baidu and Google spent between $20 billion to $30 billion on AI in 2017, with 90% of this spent on R&D and deployment, and only 10% on AI acquisitions.
Based on the progress in technology and the growing demand for smart applications, AI and mobile are the PERFECT match.