With the popularity of personal assistants, such as Siri, Cortana and Google Assistant, and new startups leveraging AI and analytics to build personal companions, consumers are moving towards a new voice-controlled relationship with technology. It is all but a given that voice-activation systems will eventually make it into the enterprise environment, where they have the potential to simplify and automate activities. Craig Walker, Director of Cloud Services at Alcatel-Lucent Enterprise, explains what needs to happen before the technology is trusted with mission-critical applications
Think how much easier it would be for a physician to say ‘System: update Mary Smith’s chart with the following: Patient experiencing abdominal pain, issue pharmacy order for 200MG of SuperAntiGas, signed Dr. FeelBetter’. Or, in a conference room, instead of struggling with a remote control, just to say ‘System: turn on projector, turn on TV and dim lights’.
Voice analytics firm VoiceLabs has identified the various layers needed to support a voice-first approach in the consumer world. But, to make the move from simple, consumer-based use cases to establishing a voice-first environment in the enterprise world, a few more things need to happen.
Security will be critical if enterprise systems are to start relying on voice commands – should anyone be able to command critical equipment or systems just by speaking? The answer, clearly, is no. Privacy, too, is a top concern. And, while the physician example above seems simple enough, we need to think about this in the context of regulations: are a patient’s rights violated if verbal commands expose their medical information to third parties?
We are already seeing voice recognition technology used for secure access, with banks, for example, introducing voice authentication to telephone banking systems.
This may leave some customers concerned about the security of their account. But my feeling is that voice-based authentication will follow the adoption cycle we saw in e-Commerce, where initial concerns over credit card fraud needed to be overcome before there was a meteoric rise in online purchasing. And we will continue to see innovation in voice recognition systems and improvements that will make voice system security viable in an enterprise environment, so that only authorised users with the right privileges can perform associated actions.
That said, whereas your microwave might not be spying on you, some devices will be always-on, always listening and potentially recording, and a few well publicised cases of privacy invasion, commercial espionage or legal jeopardy could still stall adoption. This suggests that a big On/Off switch or function needs to be included in voice-first products, giving users the benefits and not the downsides of constant monitoring. Secure software access will also need to be in place in these products to prevent and detect hacking efforts.
Effective voice recognition
The first use cases of voice recognition have mainly been in voice response systems, whether in a call centre or in our cars and smartphones. As many of us know from firsthand experience, these work marginally at best and recognition and contextualisation need to be refined through technological developments before we can realistically think about enterprise-wide adoption.
Research programmes, such as Carnegie-Mellon University’s Sphinx project, continue to enhance language recognition capabilities. An Internet Trends report by Mary Meeker indicated that in 2016, Google’s voice recognition system could recognise over five million words with around 90% accuracy, but that’s still not extensive or accurate enough. Is 90% accuracy good enough to interact with a utility provider’s network or a life support system in a hospital?
It’s not just about recognition of words; it is also about what to do with those words, which is where cognitive engines and AI come into play.
Solutions from some of the biggest players in the industry – e.g. Microsoft, with its open source cognitive recognition engine – can be leveraged to understand the context of words. ‘How do I get to Green Park?’ may sound simple enough, but it needs to be put into context. Location awareness could indicate you mean Green Park in London and help make assumptions about transportation mode. If you were sitting at Piccadilly Circus, the answer could be ‘Take one stop, Westbound, on the Piccadilly line’. But what if you meant Green Park in Manchester or Birmingham?
The search for a deeper meaning
The real challenge lies in what’s behind the voice recognition system – from the integration of IoT devices to the system itself – and in ensuring that requested commands make sense. To achieve this, we need to use cognitive engines as a check and validation system.
Think of someone accidentally giving a command to ‘Turn off cooling system to reactor 4’, instead of reactor 3, or of a doctor using the system to prescribe a harmful dose of medication because he accidently said 400 grams instead of 400 milligrams.
There will need to be a holistic view of actions being automated to prevent human error and broader intelligence to understand the actions related to voice-controlled requests. For example, even if ‘Turn off cooling system to reactor 4’ was correct, the system would need to understand a set of operational procedures to implement the command.
Creating an API platform for true voice integrated solutions
An interesting element that could tie in strategically with the development of true voice-controlled enterprise environments comes from innovations in the traditional voice communication world, where we are seeing an explosion of CPaaS (Communication Platform as-a-Service), which uses APIs to integrate voice into other applications.
Some major voice communication vendors are now intering this market, providing CPaaS infrastructures with a standardised set of APIs that enable companies to integrate communications into their business processes.
While we traditionally look at integration in terms of incorporating voice and video services into existing applications – think of a banking application that allows you to move from an online application to a voice call with a banking advisor – I believe these will play a big part in a ‘voice-first’ environment by leveraging the rich API infrastructure of CPaaS to communicate with applications and things.
How CPaaS and other platforms communicate with devices really needs to be standardised before we see rapid development of voice technology. All today’s consumer-based voice-controlled systems have their own interfaces, their own API integrations and, as with the Beta vs. VHS battle from decades ago, the potential for product obsolescence. Just as a consumer doesn’t want to invest in the latest smart coffee maker only to see the platform that controls it be discontinued, so an enterprise wants to make sure that the investment it makes in new technologies won’t become obsolete before it is able to realise a return.
The good news is that there is a set of technologies in the works to help minimise potential obsolescence, with frameworks like IoTivity being developed to build a standardised platform.
The best is yet to come
We are already seeing the value, benefits and rapid expansion of new voice applications for consumers, and in the near term we will see basic use cases move into the enterprise. Longer term, as advances continue to be made in voice recognition, voice security and simplification/standardisation in device connectivity, we will see more and more voice-first activities in both the consumer and enterprise world, helping to reduce complexity and improve our productivity.
Craig Walker is Director of Cloud Services at Alcatel-Lucent Enterprise (ALE). He has more than 25 years’ experience in publicly held telecommunication companies, start-up ventures and within the partner environment. He has been with ALE since the acquisition of Xylan Corporation in 2000, where he was Technical Director EMEA.