Mitigating and Managing Risk

Susan Terry Global Lead for Speech Analytics, Avaya

Financial services companies around the world are getting into the recording business in a big way. No, they’re not going into the studio with the latest pop music sensations. Instead, they’re recording just about everything being said in and around their offices. Client trades made over the phone. Voicemails left for financial advisors. Conference calls. Audio from video conferences. Employee calls from mobile devices, whether the company’s or their own.

A variety of federal, state, and international regulations are driving institutions to blanket their operations with voice recordings. New laws specify how long recordings need to be retained by financial firms and how quickly records must be delivered to government agencies or other requestors. Furthermore, compliance can involve granular searches. For example, complying with e-discovery requests may include isolating a specific recording of a certain employee or client conversation.

Growing Mandates and Scrutiny

In April 2012, the U.S. Commodity Futures Trading Commission (CFTC) finalized regulations recommended in the Dodd-Frank Wall Street Reform and Consumer Protection Act regarding reporting, record keeping, and daily trading records for brokers and traders in a recently popular category of financial investments called swaps. Effective July 2012, all swap dealers, buyers, and sellers must keep “all oral and written communications … that lead to the execution of a swap, whether communicated by telephone, voicemail, facsimile, instant messaging, chat rooms, electronic mail, mobile device, or other digital or electronic media” (Commodity Futures Trading Commission).

In March 2012, the U.S. District Court in New York granted a Federal Trade Commission (FTC) motion for summary judgment against businessman Paul Navestad for violating the FTC Act. Navestad was found to have made material, false, and deceptive claims to deceive consumers and to have violated the Telemarketing Sales Rule (TSR). In addition to calling consumers on the national Do Not Call Registry, Navestad’s violations included:

  • Not letting consumers opt out of receiving calls.

  • Not letting consumers speak to a live operator.

  • Making false and deceptive statements to induce consumers to pay for services that would allegedly enable them to easily and quickly receive public or private grants (Federal Trade Commission v. Paul Navestad, 2012).

While the TSR does not apply directly to financial institutions, individuals, or companies, it does apply to them indirectly when they contract with a telemarketing firm that must comply with the TSR. However, two other rules do apply directly to financial services companies in the U.S.: the Telephone Consumer Protection Act (TCPA) and the Gramm-Leach-Bliley Act. Both of these acts were recently amended, and failing to comply with them can be costly, with financial firms liable up to $100,000 per violation and their officers and directors personally liable up to $10,000 per violation.

Real-time phonetic search enables insights discovered in speech to be populated into business intelligence (BI) platforms, all within a very short time from occurrence to discovery.

These are just a few of the regulatory requirements impacting U.S. financial institutions. Multinational firms have many more rules to heed. For instance, the U.K.’s Financial Services Authority (FSA) required all participants in the country’s capital markets to begin recording the mobile communications (including voice calls, SMS/text messages, and instant messages) of all their employees involved in trading starting in November 2011 (Financial Services Authority, 2011).

The Phenomenal Power of Phoneme Analysis

To keep up with these new rules, financial institutions are increasingly turning to phonetic search to find and analyze oral conversations and content. Phonetic search is a process built on phonemes—basic language elements that provide the building blocks for how human speech sounds. Phonetic search provides the capability to capture calls in real time, regardless of the source. Institutions can then use advanced analytical tools to mine the phonetic records from those calls to identify specific topics, people, and calls.

Until recently, the science of phonetics was confined to university research laboratories. However, the breadth of potential applications in the commercial world is accelerating its development and use. For any user wishing to access information from an audio stream in real time, the phonetic search approach is the only practical option. Its lightweight processing power requirements mean that phonetic searches are able to scale easily to whatever levels are required to cover an entire organization.

What is accomplished by an entire server performing speech-to-text search can be accomplished on a single CPU core with phonetic search.

The benefits of this approach are wide-ranging. Perhaps the most valuable is the ability to enable decisions to be made quickly with accurate and up-to-date information. Real-time phonetic search enables insights discovered in speech to be populated into business intelligence (BI) platforms, allowing financial institutions to consume aggregated data, measure the scale of a problem, and compare its criticality to other issues—all within a very short time from occurrence to discovery. This low latency enables companies to deploy proactive notification systems to make the technology work in an observerless way, saving time and resources.

Another benefit of phoneme-based searches is that they do not require a large vocabulary of predefined phonemes. For example, there are just 40 phonemes in U.S. English and 44 in U.K. English. Bottom line: Phoneme-based searching is faster and more efficient than speech-to-text conversion and search.

The first step in a phonetic search is to build a language- and dialect-dependent index of the audio content represented as phoneme strings. Future searches then leverage this index to yield hits or results. Words and phrases a user searches on are converted into phoneme strings, and searches or matches are then obtained by walking through the index. Each hit enriches the context for future searches.

The results are presented in such a way that a user can see which portion or region of the selected audio content contained phrases or utterances deemed to be similar.

Taking Phonetic Search to the Next Level

Two technology advances are expanding the capabilities of phonetic search. One is the development of high-performance desktop clients for searching and indexing searches in real time. Searches can be issued through such a client, and as part of the process, relevance thresholds can be set that define results the user can ignore. There is no right or wrong way to set the relevance threshold level. By varying it, trade-offs can be made between the number of false positives and false negatives.

The second noteworthy development in phonetic search is the emergence of cloud-based BI solutions. Scalable, secure cloud BI can provide advanced analytics and reporting with low organizational risk, impact, and cost. For example, phone calls can be tagged by criteria, such as the work shifts during which they occurred or the top reasons clients are calling the institution. Cloud solutions also offer automated upload of search and discovery results to an analytics and reporting engine. They can also include out-of-the-box coverage for industry-standard key performance indicators such as first call resolution and average hold times.

These and other advantages of phonetic search not only can offer faster (and potentially more accurate) analysis, but also can significantly lower expenses across an organization. What once took an entire speech-to-text server can now be accomplished on a single CPU core with phonetic search. For example, the speech-to-text system analysis of 200 hours of data in 24 hours may require the purchase of a $2,000 server. Based on the experience of Avaya customers, a phonetic search system can analyze 500 hours of data in only two hours, requiring only a laptop computer costing less than $1,000 (not including software, maintenance, and other costs).

Also, total cost of ownership may be lower due to the reduced maintenance effort required to operate a phonetic search system. A phonetic search system is not dependent on a dictionary to perform recognition, which means that it natively supports product names, jargon, and other nondictionary phrases. This translates into fewer ongoing costs relating to system operation and support of a changing business environment.

A Powerful Compliance Capability

Financial services companies are under great pressure to record, spoken or written transactions of every kind and to be able to retrieve recordings on demand. Phonetic search, coupled with powerful desktop clients and cloud-based deployment, can help financial firms respond rapidly to regulators and other entities. This can help to ensure that they remain in compliance with the complicated web of laws around the world under which they must operate.

Susan Terry is currently the global lead for Avaya Speech Analytics, leading a team of subject matter experts dedicated to the development of the speech analytics business in Avaya. To this role she brings extensive experience in helping organizations develop solutions and business strategies for the use of innovative technologies.