
The newly released Angel Site Builder provides the voice site developer with a set of powerful tools that take the crafting of effective voice applications to a whole new level. Up to this point, all voice sites shared a fixed set of platform properties that allowed little room for tuning applications. For instance, prior to Site Builder, all voice applications running on Angel, regardless of context, waited the same amount of time before declaring that a speaker had finished speaking (see completetimeout and incompletetimeout below). Obviously, there are contexts where this is not sufficient and we may want to wait longer to reach that conclusion than other contexts (i.e. - a person carefully reading off a series of digits).
In this edition of VUI View, we discuss six new voice site properties that will empower developers to tune the speech performance of their voice applications. The properties described below are available both at the properties page of a voice site and at the voice page level. Access to these properties is available upon request to professional plan subscribers.
Confidence Level
The speech recognition engine tags every utterance it processes with a "confidence level" - a floating point number between 0.00 and 1.00 and with a default settings of 0.45. Utterances that are tagged with a confidence level at or above the value specified for the confidencelevel are accepted by the speech engine as valid input, while those tagged with values below the value of confidencelevel are rejected as a no match. For instance, if we set the confidencelevel threshold at 0.80, then whenever the speech engine tags an utterance with a confidence level less than 0.80, it will return a no-match. Obviously, setting the confidence level to the maximum value will yield greater accuracy of the items accepted by the speech engine. However, the down side of setting a high confidence level is that in some instances correct matches will be wrongly tagged as a no-match.
Sensitivity
This property enables you to take into account any prevailing noise conditions within which your voice application will be used. The value of sensitivity is a floating point number between 0.00 and 1.00 and defaults to 0.50. If your application is called from noisy environments (i.e. - busy streets, factory floor), then set the value of sensitivity below the default setting of 0.50. If you expect it to be used from a quiet setting or expect the callers to be speaking softly, then set it to above the default setting.
Speed vs. Accuracy
This property enables you to tip the balance between the time the speech recognition engine takes to process speech input and the accuracy of the results returned. The value of speedvsaccuracy is a floating point number between 0.00 and 1.00 and defaults to 0.50. Increasing the value will improve the accuracy of the speech engine, but will also slow down the speed with which the result is returned. If your testing shows that the speech engine is yielding highly accurate results but is taking too long for your liking, try lowering the value of speedvsaccuracy to something lower than the default of 0.50. You may discover that the accuracy of the speech engine is still high enough while the time it takes to return with a match is shorter.
Timeout
This property enables you to specify the length of time (in seconds) you want the application to wait for the speaker to begin talking after the end of the last prompt played by the system. The value of timeout defaults to 5 seconds. If the speaker takes longer than the value specified for timeout, then the system will return with a no-input. Adjust the value of this property to a higher value if you expect callers to take a long time to return with an answer, and to a lower value if you expect them to respond quickly.
Complete Timeout
This is the amount of time the system needs to wait for additional input after the speech-recognition engine has accepted the first piece of input. The default value is 0.25 seconds. You should increase the length of this property if the items you list as options for callers contain expressions that are substrings of other valid expressions. For instance, if the caller can say "1 2 3" and "1 2 3 4," you will want to make sure that the recognizer does not cut off processing too early just because it heard "1 2 3."
Incomplete Timeout
This is the amount of time the system waits for additional speech input from the caller when the caller has not said one of the items the system is listening for or has said a longer string that is accepted by the recognizer. The default value is 0.75 seconds. If the silence from the caller exceeds the incompletetimeout value, a no-match event is thrown. Because incompletetimeout applies to both valid and invalid input, the value of incompletetimeout is not allowed to be less than incompletetimeout. For instance, if both "1 2 3" and "1 2 3 4" are valid inputs, and the value of incompletetimeout is 1 second and that of completetimeout is 2 seconds, the input "1 2 3" will not be accepted if the caller pauses for more than 1 second.
It is important to be extremely careful in your manipulation of the properties described above. First, make sure that your manipulations are driven by both a solid understanding of your application and empirical knowledge of the context within which your application will be used. For example, factor in the complexity of the tasks undertaken during the application, who will be calling it, the noise level of the caller's environment, etc (See last month's newsletter for a discussion touching on these issues). Second, be aware that the properties affect one another. For instance, decreasing the value of speedvsaccuracy will yield less accurate results; if at the same time you increase the value of confidencelevel, the consequence may be a sharp increase in the rate of no-matches returned.
