Extract interaction analytics from a media file

Last updated: 2024-02-08Contributors
Edit this page

Interaction analytics is used to understand a conversation happening in a meeting between two or more people and extract from them more meaningful insights at scale. This API is a comprehensive in that in addition to its unique capabilities, it also bundles functionality found in our other APIs. In processing a media file, this API will provide multiple levels of insights, including:

Let's say we want to analyze a meeting between a sales representative and a customer, and that meeting lasted for twenty minutes. Here are some of the insights we can extract using this API:

  • Speaker talking time, e.g. a sales representative spoke for ten minutes, and the customer spoke for eight minutes.
  • Speaker pace, which is measured by an average number of words spoken per minute.
  • Speaker emotions, which was the tone or emotional context of every utterance.
  • Auto-generated meeting summary

Extracting interaction analytics

For the best results we recommend following the guidelines below.

  • The audioType parameter provides the system with a hint about the nature of the meeting which helps improve accuracy. We recommend setting this parameter to CallCenter when there are 2-3 speakers expected to be identified and Meeting when 4 or more speakers are expected.

  • Set the enableVoiceActivityDetection parameter to True if you want silence and noise segments removed from the diarization output. We suggest you to set it to True in most circumstances.

  • Setting the source parameter helps to optimize the diarization process by allowing a specialized acoustic model built specifically for the corresponding audio sources.

  • If you specify the speakerIds parameter, make sure that all the speaker ids in the array exist. Otherwise, the API call will fail. As a good practice, you can always read the speaker ids from your account and use the correct ids of the speakers, who you think that might speak in the audio file.

Request body parameters

Parameter Type Description
encoding String Encoding of audio file like MP3, WAV etc.
sampleRate Number Sample rate of the audio file. Optional.
languageCode String Language spoken in the audio file. Default of "en-US".
separateSpeakerPerChannel Boolean Set to True if the input audio is multi-channel and each channel has a separate speaker. Optional. Default of False.
speakerCount Number Number of speakers in the file. Optional.
audioType String Type of the audio based on number of speakers. Optional. Permitted values: CallCenter, Meeting, EarningsCalls, Interview, PressConference, Voicemail
speakerIds List[String] Optional set of speakers to be identified from the audio. Optional.
enableVoiceActivityDetection Boolean Apply voice activity detection. Optional. Default of False.
contentUri String Publicly facing url.
source String Source of the audio file eg: Phone, RingCentral, GoogleMeet, Zoom etc. Optional.
insights List[String] List of insights to be returned. Specify ['All'] to extract all insight analytics. Permitted Values: All, KeyPhrases, Emotion, AbstractiveSummaryLong, AbstractiveSummaryShort, ExtractiveSummary, TalkToListenRatio, Energy, Pace, QuestionsAsked, Topics.
speechContexts List[Phrase Object] Indicates the words/phrases that will be used for boosting the transcript. This can help to boost accuracy for cases like Person Names, Company names etc.

Sample code to extract insights of a conversation

The following code sample shows how to extract insights of a conversations from a call recording.

Follow the instructions on the quick start section to setup and run your server code before running the sample code below.

Running the code

  • Edit the variables in ALL CAPS with your app and user credentials before running the code.
  • You can only run on your production account, this means that you have to use app credentials for production.
  • Also make sure that you have recorded several voice recordings of your own voice.
const fs = require ('fs')
const RC = require('@ringcentral/sdk').SDK

// Instantiate the SDK and get the platform instance
var rcsdk = new RC({
    server: 'https://platform.ringcentral.com',
    clientId: 'RC_APP_CLIENT_ID',
    clientSecret: 'RC_APP_CLIENT_SECRET'
});
var platform = rcsdk.platform();

/* Authenticate a user using a personal JWT token */
platform.login({ jwt: 'RC_USER_JWT' })

platform.on(platform.events.loginSuccess, () => {
    NGROK = "NGROK-TUNNEL-ADDRESS"
    WEBHOOK_URL = NGROK + "/webhook";
    CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI"
    analyze_interaction()
})

platform.on(platform.events.loginError, function(e){
    console.log("Unable to authenticate to platform. Check credentials.", e.message)
    process.exit(1)
});

/*
* Transcribe a call recording and analyze interaction
*/
async function analyze_interaction() {
    try {
      let bodyParams = {
          contentUri:                   CONTENT_URI,
          encoding:                     "Mpeg",
          languageCode:                 "en-US",
          source:                       "RingCentral",
          audioType:                    "Meeting",
          insights:                     [ "All" ],
          enableVoiceActivityDetection: true,
          separateSpeakerPerChannel:    true
      }
      let endpoint = `/ai/insights/v1/async/analyze-interaction?webhook=${WEBHOOK_URL}`
      let resp = await platform.post(endpoint, bodyParams);
      let jsonObj = await resp.json();
      if (resp.status == 202) {
        console.log("Job ID: " + jsonObj.jobId);
        console.log("Ready to receive response at: " + WEBHOOK_URL);
      }
    } catch (e) {
        console.log(`Unable to call this API. ${e.message}`);
    }
}
from ringcentral import SDK
import os,sys,urllib.parse,json

NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS"
WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
CONTENT_URI = 'PUBLICLY-ACCESSIBLE-CONTENT-URI'

#
# Transcribe a call recording and analyze interaction
#
def analyze_interaction():
    try:
        bodyParams = {
          'contentUri': CONTENT_URI,
          'encoding': "Mpeg",
          'languageCode': "en-US",
          'source': "RingCentral",
          'audioType': "CallCenter",
          'insights': [ "All" ],
          'enableVoiceActivityDetection': True,
          'separateSpeakerPerChannel': True
        }
        endpoint = f'/ai/insights/v1/async/analyze-interaction?webhook={urllib.parse.quote(WEBHOOK_URL)}'
        resp = platform.post(endpoint, bodyParams)
        jsonObj = resp.json()
        if resp.response().status_code == 202:
            print(f'Job ID: {resp.json().jobId}');
            print(f'Ready to receive response at: {WEBHOOK_URL}');
    except Exception as e:
      print ("Unable to analyze interaction. " + str(e))



# Authenticate a user using a personal JWT token
def login():
  try:
      platform.login( jwt= "RC_USER_JWT" )
      analyze_interaction()
  except Exception as e:
      print ("Unable to authenticate to platform. Check credentials. " + str(e))

# Instantiate the SDK and get the platform instance
rcsdk = SDK("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com")
platform = rcsdk.platform()

login()
<?php
require('vendor/autoload.php');

// Instantiate the SDK and get the platform instance
$rcsdk = new RingCentral\SDK\SDK( 'RC_APP_CLIENT_ID', 'RC_APP_CLIENT_SECRET', 'https://platform.ringcentral.com' );
$platform = $rcsdk->platform();

/* Authenticate a user using a personal JWT token */
$platform->login(["jwt" => 'RC_USER_JWT']);
$NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS";
$WEBHOOK_URL = $NGROK_ADDRESS . "/webhook";
$CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI";
analyze_interaction();

/*
* Transcribe a call recording and analyze interaction
*/
function analyze_interaction()
{
  global $platform, $WEBHOOK_URL, $CONTENT_URI;
  try {
    $bodyParams = array (
        'contentUri' =>  $CONTENT_URI,
        'encoding' => "Mpeg",
        'languageCode' =>  "en-US",
        'source' => "RingCentral",
        'audioType' =>  "CallCenter",
        'insights' => array ( "All" ),
        'enableVoiceActivityDetection' => True,
        'separateSpeakerPerChannel' =>  True
    );
    $endpoint = "/ai/insights/v1/async/analyze-interaction?webhook=" . urlencode($WEBHOOK_URL);
    $resp = $platform->post($endpoint, $bodyParams);
    $jsonObj = $resp->json();
    if ($resp->response()->getStatusCode() == 202) {
      print_r ("Job ID: " . $jsonObj->jobId . PHP_EOL);
      print_r("Ready to receive response at: " . $WEBHOOK_URL . PHP_EOL);
    }
  }catch (\RingCentral\SDK\Http\ApiException $e) {
    // Getting error messages using PHP native interface
    print_r ('HTTP Error: ' . $e->getMessage() . PHP_EOL);
    // Another way to get message, but keep in mind, that there could be no response if request has failed completely
    print_r ('Unable to analyze interaction. ' . $e->apiResponse->response()->error() . PHP_EOL);
  }
}
?>
require 'ringcentral'

NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS"
WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
CONTENT_URI = 'PUBLICLY-ACCESSIBLE-CONTENT-URI'

#
# Transcribe a call recording and analyze interaction
#
def analyze_interaction()
    bodyParams = {
        'contentUri': CONTENT_URI,
        'encoding': "Mpeg",
        'languageCode': "en-US",
        'source': "RingCentral",
        'audioType': "CallCenter",
        'insights': [ "All" ],
        'enableVoiceActivityDetection': true,
        'separateSpeakerPerChannel': true
    }
    queryParams = {
      'webhook': WEBHOOK_URL
    }
    endpoint = "/ai/insights/v1/async/analyze-interaction"
    begin
      resp = $platform.post(endpoint, payload: bodyParams, params: queryParams)
      body = resp.body
      if resp.status == 202
          puts('Job ID: ' + body['jobId']);
          puts ('Ready to receive response at: ' + WEBHOOK_URL);
      end
    rescue StandardError => e
      puts ("Unable to analyze interaction. " + e.to_s)
    end
end

# Authenticate a user using a personal JWT token
def login()
  begin
    $platform.authorize( jwt: "RC_USER_JWT" )
    analyze_interaction()
  rescue StandardError => e
    puts ("Unable to authenticate to platform. Check credentials. " + e.to_s)
  end
end

# Instantiate the SDK and get the platform instance
$platform = RingCentral.new( "RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com" )

login()
using System;
using System.IO;
using System.Threading.Tasks;
using System.Collections.Generic;
using RingCentral;
using Newtonsoft.Json;

namespace AnalyzeInteraction {
  class Program {
    static RestClient restClient;
    static string NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS";
    static string WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
    static string CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI";

    static async Task Main(string[] args){
      try
      {
        // Instantiate the SDK
        restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");

        // Authenticate a user using a personal JWT token
        await restClient.Authorize("RC_USER_JWT");

        await analyze_interaction();
      }
      catch (Exception ex)
      {
        Console.WriteLine("Unable to authenticate to platform. Check credentials. " + ex.Message);
      }
    }
    /*
    * Transcribe a call recording and analyze interaction
    */
    static private async Task analyze_interaction()
    {
      try
      {
        var bodyParams = new InteractionInput()
        {
            contentUri = CONTENT_URI,
            encoding = "Mpeg",
            languageCode = "en-US",
            source = "RingCentral",
            audioType = "CallCenter",
            insights = new String[] { "All" },
            enableVoiceActivityDetection = true,
            separateSpeakerPerChannel = true
        };
        var queryParams = new CaiAnalyzeInteractionParameters() { webhook = WEBHOOK_URL };

        var resp = await restClient.Ai().Insights().V1().Async().AnalyzeInteraction().Post(bodyParams, queryParams);
        Console.WriteLine("Job ID: " + resp.jobId);
        Console.WriteLine("Ready to receive response at: " + WEBHOOK_URL);
      }
      catch (Exception ex)
      {
        Console.WriteLine("Unable to analyze interaction. " + ex.Message);
      }
    }
  }
}
package AnalyzeInteraction;

import java.io.IOException;
import com.google.common.reflect.TypeToken;
import com.google.gson.Gson;

import com.ringcentral.*;
import com.ringcentral.definitions.*;

public class AnalyzeInteraction {
    static String NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS";
    static String WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
    static String CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI";

    static RestClient restClient;

    public static void main(String[] args) {
      var obj = new AnalyzeInteraction();
      try {
        // Instantiate the SDK
        restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");

        // Authenticate a user using a personal JWT token
        restClient.authorize("RC_USER_JWT");

        obj.analyze_interaction();

      } catch (RestException e) {
        System.out.println(e.getMessage());
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
    /*
    * Transcribe a call recording and analyze interaction
    */
    private void analyze_interaction()
    {
      try {
            var bodyParams = new InteractionInput()
                  .contentUri(CONTENT_URI)
                  .encoding("Mpeg")
                  .languageCode("en-US")
                  .source("RingCentral")
                  .audioType("CallCenter")
                  .insights(new String[] {"All"})
                  .enableVoiceActivityDetection(true)
                  .separateSpeakerPerChannel(true);

            var queryParams = new CaiAnalyzeInteractionParameters().webhook(WEBHOOK_URL);
            var resp = restClient.ai().insights().v1().async().analyzeInteraction().post(bodyParams, queryParams);
        System.out.println("Job ID: " + resp.jobId);
        System.out.println("Ready to receive response at: " + WEBHOOK_URL);
        } catch (Exception ex) {
            System.out.println("Unable to analyze interaction. " + ex.getMessage());
        }
    }
}

Example response

{
    "jobId": "80800e1a-a663-11ee-b548-0050568ccd07",
    "api": "/ai/insights/v1/async/analyze-interaction",
    "creationTime": "2023-12-29T16:01:18.558Z",
    "completionTime": "2023-12-29T16:01:29.217Z",
    "expirationTime": "2024-01-05T16:01:18.558Z",
    "status": "Success",
    "response": {
        "utteranceInsights": [
            {
                "start": 3.72,
                "end": 7.56,
                "text": "Good evening, thank you for calling electronics or this is Rachel.",
                "confidence": 0.85,
                "speakerId": "0",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Neutral",
                        "confidence": 0.54
                    }
                ]
            },
            {
                "start": 7.56,
                "end": 8.96,
                "text": "How may I assist you?",
                "confidence": 0.85,
                "speakerId": "0",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Fear",
                        "confidence": 0.71
                    }
                ]
            },
            {
                "start": 8.96,
                "end": 9.8,
                "text": "Hi, Rachel.",
                "confidence": 0.85,
                "speakerId": "1",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Neutral",
                        "confidence": 0.79
                    }
                ]
            },
            {
                "start": 9.8,
                "end": 11.16,
                "text": "I would like to know how to use this car.",
                "confidence": 0.85,
                "speakerId": "1",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Neutral",
                        "confidence": 0.4
                    }
                ]
            },
            {
                "start": 11.16,
                "end": 14.28,
                "text": "Bluetooth headset I recently purchased from your store.",
                "confidence": 0.85,
                "speakerId": "1",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Neutral",
                        "confidence": 0.46
                    }
                ]
            },
            {
                "start": 14.28,
                "end": 21.36,
                "text": "Sure, ma'am, I can help you out with that, but before anything else, I have your name so that I can address you properly.",
                "confidence": 0.87,
                "speakerId": "0",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Neutral",
                        "confidence": 0.91
                    }
                ]
            },
            {
                "start": 21.36,
                "end": 23.58,
                "text": "Yes, this is Meredith Blake.",
                "confidence": 0.87,
                "speakerId": "1",
                "insights": [
                    {
                        "name": "Emotion",
                        "value": "Neutral",
                        "confidence": 0.91
                    }
                ]
            },
            ...
        ],
        "speakerInsights": {
            "speakerCount": 2,
            "insights": [
                {
                    "name": "Energy",
                    "values": [
                        {
                            "speakerId": "0",
                            "value": 93.11
                        },
                        {
                            "speakerId": "1",
                            "value": 93.65
                        }
                    ]
                },
                {
                    "name": "Pace",
                    "values": [
                        {
                            "speakerId": "0",
                            "value": "medium",
                            "wpm": 152.9
                        },
                        {
                            "speakerId": "1",
                            "value": "fast",
                            "wpm": 196.9
                        }
                    ]
                },
                {
                    "name": "TalkToListenRatio",
                    "values": [
                        {
                            "speakerId": "0",
                            "value": "58:42"
                        },
                        {
                            "speakerId": "1",
                            "value": "42:58"
                        }
                    ]
                },
                {
                    "name": "QuestionsAsked",
                    "values": [
                        {
                            "speakerId": "0",
                            "value": 5,
                            "questions": [
                                {
                                    "text": "Good evening, thank you for calling electronics or this is Rachel. How may I assist you?",
                                    "start": 3.72,
                                    "end": 8.96
                                },
                                {
                                    "text": "Okay, thank you for that, Mrs. Plague, what exactly do you want done with your headset?",
                                    "start": 23.9,
                                    "end": 29.72
                                },
                                ...
                            ]
                        },
                        {
                            "speakerId": "1",
                            "value": 3,
                            "questions": [
                                {
                                    "text": "Well, we have already done that. I only ask a simple question. Why can't you seem to get that?",
                                    "start": 102.22,
                                    "end": 107.7
                                },
                                ...
                            ]
                        }
                    ]
                }
            ]
        },
        "conversationalInsights": [
            {
                "name": "KeyPhrases",
                "values": [
                    {
                        "start": 11.55,
                        "end": 11.94,
                        "value": "headset",
                        "confidence": 0.92
                    },
                    {
                        "start": 13.89,
                        "end": 14.28,
                        "value": "store",
                        "confidence": 0.94
                    },
                    {
                        "start": 29.36,
                        "end": 29.72,
                        "value": "headset",
                        "confidence": 0.86
                    },
                    {
                        "start": 34.32,
                        "end": 34.72,
                        "value": "headset",
                        "confidence": 0.91
                    },
                    {
                        "start": 38.68,
                        "end": 39.08,
                        "value": "phone",
                        "confidence": 0.86
                    },
                    {
                        "start": 43.77,
                        "end": 44.24,
                        "value": "iphone",
                        "confidence": 0.89
                    },
                    ...
                ]
            },
            {
                "name": "ExtractiveSummary",
                "values": []
            },
            {
                "name": "Topics",
                "values": [
                    {
                        "value": "car bluetooth headset",
                        "start": 9.8,
                        "end": 114.2,
                        "confidence": 0.92
                    }
                ]
            },
            {
                "name": "AbstractiveSummaryLong",
                "values": [
                    {
                        "value": "First speaker helps second speaker use a car bluetooth headset from the store and asks speaker 1 to switch off the device with speaker's phone.",
                        "start": 3.72,
                        "end": 114.2,
                        "confidence": 0.4,
                        "groupId": "0"
                    }
                ]
            },
            {
                "name": "AbstractiveSummaryShort",
                "values": [
                    {
                        "value": "First speaker helps second speaker use a car bluetooth headset from the store and asks speaker 1 to switch off the device with speaker's phone.",
                        "start": 3.72,
                        "end": 114.2,
                        "confidence": 0.4
                    }
                ]
            }
        ]
    }
}

Interaction-Analytics-Object

Interaction analytics are presented by insights grouped and categorized under the following category objects:

Parameter Type Description
utteranceInsights List[Utterance-Insights] List of utterances and the insights computed for each utterance.
speakerInsights Object The set of insights computed for each speaker separately.
conversationalInsights List[Conversational-Insights-Object] List of insights computed by analyzing the conversation as a whole.

Utterance-Insights

The utteranceInsights is a list of objects, with each object contains the following key/value pairs:

Parameter Type Description
start Number Start time of the audio segment in seconds.
end Number End time of the audio segment in seconds.
text String The transcription output corresponding to the segment (a.k.a an utterance).
confidence Number The confidence score for the transcribed segment.
speakerId String The speaker id for the corresponding audio segment.
insights List[Utterance-Insights-Unit] List of insights from the utterance text.

Utterance-Insights-Unit

Currently, only the Emotion insight is supported

Parameter Type Description
name String Enum Currently supported insight: [ Emotion ].
value String Possible values: Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, Trust, Neutral.
confidence Number Confidence Score. Optional.

Speaker-Insights-Object

The speakerInsights object contain the number of speakers which was detected

Parameter Type Description
speakerCount Number Number of speakers detected. In case speakerCount is not specified, the number of speakers are estimated algorithmically.
insights List[Speaker-Insights-Unit] List of overall level insights. Each insight is computed separately for each speaker.

Speaker-Insights-Unit

Parameter Type Description
name String Enum Name of the insight. Possible values: Energy, Pace, TalkToListenRatio, QuestionsAsked
values List[Speaker-Insights-Value-Unit] Value corresponding to the insight
Speaker-Insights-Value-Unit
  • Energy

    Parameter Type Description
    speakerId String The speaker id for whom insights are computed.
    value Number The computed value of the insight for this speaker.
  • Pace

    Parameter Type Description
    speakerId String The speaker id for whom insights are computed.
    value String The label of speech speed. slow, medium or fast.
    wpm Number The average number of words per minute spoken by this speaker.
  • TalkToListenRatio

    Parameter Type Description
    speakerId String The speaker id for whom insights are computed.
    value String The computed time ratio a speaker talks and listens.
  • QuestionsAsked

    Parameter Type Description
    speakerId String The speaker id for whom insights are computed.
    value Number The computed value of the insight for this speaker.
    questions List[Question-Insights-Value-Unit] List of questions asked by each speaker.
Question-Insights-Value-Unit
Parameter Type Description
text String The question a speaker asked.
start Number The start time of the audio segment in seconds.
end Number The end time of the audio segment in seconds.

Timed-Segment

Parameter Type Description
start Number Start time of the audio segment in seconds.
end Number End time of the audio segment in seconds.

Conversational-Insights-Object

Parameter Type Description
name String Enum Name of the insight. Possible values: AbstractiveSummaryLong, AbstractiveSummaryShort, ExtractiveSummary, KeyPhrases, Topics
values List[Conversational-Insights-Value-Unit] Value corresponding to the insight

Conversational-Insights-Value-Unit

  • KeyPhrases

    Parameter Type Description
    start Number Start time of the audio segment in seconds.
    end Number End time of the audio segment in seconds.
    value String The output corresponding to the insight.
    confidence Number The confidence score for the computed insight.
  • Topics

    Parameter Type Description
    start Number Start time of the audio segment in seconds.
    end Number End time of the audio segment in seconds.
    value String The output corresponding to the insight.
    confidence Number The confidence score for the computed insight.
  • ExtractiveSummary

    Parameter Type Description
    start Number Start time of the audio segment in seconds.
    end Number End time of the audio segment in seconds.
    sentence String The summarized text segment.
  • AbstractiveSummaryLong

    Parameter Type Description
    value String The text of a long abstractive summary.
    start Number Start time of the audio segment in seconds.
    end Number End time of the audio segment in seconds.
    confidence Number The confidence score for the computed insight.
    groupId String The index of this long abstractive summary.
  • AbstractiveSummaryShort

    Parameter Type Description
    value String The text of a short abstractive summary.
    start Number Start time of the audio segment in seconds.
    end Number End time of the audio segment in seconds.
    confidence Number The confidence score for the computed insight.

NOTES:

  • In case of ExtractiveSummary, the start and end times refer to the exact time of the segment.
  • In case of AbstractiveSummaryLong and AbstractiveSummaryShort the start and end time refer to the time of text blob which is abstracted.