Speech Recognition is transcription of human speech or audio to text. The system should be able to recognize and translate the spoken language of the speaker to text format. It is also known as ,”Computer Speech Recognition” or “Automatic Speech Recognition(ASR)”. In iOS 10, Apple introduced Speech Recognition API, a new framework that allows apps to support continuous speech recognition from either live or prerecorded audio and transcribe it into text. Using Speech framework, apps can use the speech recognition API of Apple and extend this feature into their services.
Why use Speech Recognition API
Prior to iOS 10, Apple allowed users to interact with the device through speech only via Siri(Apple voice-controlled personal assistant) and Keyboard dictation-enabled by tapping the microphone button left of the space bar in the keyboard.
Moreover, keyboard dictation was the only way for the developers to allow users to interact with an application by using the default iOS keyboard. However, there are many limitations with this feature.
- It is only available through user interface elements that support TextKit
- Limited to live audio
- Supports only system’s default keyboard language
- Most importantly, it lacks additional information such as confidence intervals, timing, and alternate interpretations.
Speech Framework provides us with a more powerful way to integrate the speech recognition capabilities of Apple and gives fast and accurate results in real time. It provides more information about the results in addition to transcription of speech to text. Some of the benefits include
- Supports both pre-recorded audio and live-speech
- Multiple interpretations of the speech
- Confidence levels
- Timing information
The entire process of speech translation into text is handled by the Apple servers, which requires for the device to have an active internet connection.
Features of iOS Speech Recognition API
- Uses same technology as Siri and keyboard dictation.
- Highly accurate
- Adapts to the user(Individual preferences)
- Supports over 50 languages and dialects.
- Protects user privacy
How to configure your app to support Speech Recognition
First and foremost, the developer has to make sure that the speech recognition is available for a given language at the current time by adopting the SFSpeechRecognizerDelegate protocol. Since speech recognition requires user data to be send to the servers and stored, it is important to respect the user privacy and should get explicit permission from the user.
App must request the user permission to access the device microphone and speech recognition. Provide a string in NSSpeechRecognitionUsageDescription key in the app’s info.plist, which explains the user as to why speech recognition is used by the app. Also,include an usage description string for NSMicrophoneUsageDescription key to access the device microphone.
It should be noted that failure to provide these required keys will result in termination of app by the system. When the app uses speech recognition for the first time, the aforementioned string will be shown to the user as an alert. If the user grants the permission, the app is ready to process the request.
The sample code for user authorization request is as follows:
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status){
dispatch_async(dispatch_get_main_queue(), ^{
switch (status) {
// In case, implement the steps to be taken when the specific condition is satisfied
case SFSpeechRecognizerAuthorizationStatusAuthorized:{
NSLog(@"Speech Recognition request accepted");
break;
}
case SFSpeechRecognizerAuthorizationStatusDenied:{
NSLog(@"Speech Recognition request denied");
break;
}
case SFSpeechRecognizerAuthorizationStatusRestricted:
case SFSpeechRecognizerAuthorizationStatusNotDetermined:
}
});
}];
Best Practices
Here are some best practices for using Speech Recognition API in your app.
- The app should clearly indicate the user that speech recognition or recording is being used by using a visual or audio indicator in the user interface.
- Users’ sensitive data such as passwords, financial information should not be used for speech recognition.
- Displaying speech as it is recognized by the device helps the user know what the app is doing.
- Connectivity and unavailability issues should be handled gracefully.
Limitations
- Individual iOS devices and apps are restricted to limited number of requests per day. In other words, its usage is limited.
- Usage limit failures are to be handled by the app itself.
- If a particular app constantly hits usage limits or anticipates more usage, they can contact Apple for support.
- Maximum duration limit of a audio input is 1 minute, due to high network traffic and battery issues.
Introduction of NLP API with iOS 11:
iOS 11 has introduced the NLP APIs to enable developers to provide an improved user experience in voice commands and voice enabled functions. The new NLP APIs can be integrated with any iOS app. Speech recognition is predicted more accurately with the power of Machine learning by extracting text from the user’s previously browsed media.
The NLP APIs consist of a wide range of tools that can be used based on the needs. Once an input of a natural language text is fed into NLP, it can accomplish any of the following tasks, or tagging schemes:
- Identification of language of the user
- Splitting of the text into words, sentences, paragraphs etc.
- Identifying and splitting the text into parts of speech (nouns, verbs and adjectives)
- Identifying forms of the words, which is known as Lemmatization. (Ex: drive, drove and driven)
- Identifying the names of people, place and organizations etc.
Download the API from here, and implement it in your apps to see how it works.