Challenge & Goal Summary
Aim of this study is to compare automatically detected emotions with manually observed observations.
We want to extract the periods of explicit emotions from the measured data in order to watch those fragments in the video. Additionally, we want to display the stream of measured emotions when the video plays.
Most important though, is the ability to find contingencies and co-occurrences between visual observations, topics discussed and the emotional reactions of the two participants.
The biggest challenge we face here is:
- Extracting usable fragments for specific automatically measured emotions. This means we need to turn ranges of frames into behavioral Events whenever a certain value is exceeded.
- Integrate full natural spoken word transcriptions during observation.
The goal is:
- Compare automatically measured positive emotions with manually logged sequences, during which the observer interpreted the expression as positive.
- Find sequences between speaker and measured emotions.
- Analyze use of words per participant.
That’s how you can do it:
The source we use is a recorded television debate between the politicians Mrs. Merkel (CDU) and Mr. Steinbrück (SPD).
To collect the automatically recognized emotions, the video was processed though the Microsoft Emotion API service*, which analyzed the video frame by frame, resulting in a huge list with measured values.
Additionally, Mangold INTERACT is used to score all observations, starting from camera view, speaker to gestures and observed expressions.
Prepare Data Logging
For the data collection, we prepare some code definitions that are used in different passes though the video.
Plain, mutually exclusive code definition to identify who is currently talking.
Lexical coding system to specify the camera angle, focused area and the people in the picture. This requires three linked levels with codes.
Camera View – Focus – In Picture
A two code coding system to log all sequences where out participants use some kind of gestures.
During the first pass, we start with the easy to observe video picture, focusing on the camera view and specifying the focus and the people that are currently visible. This pass requires the Coding Mode ‘Lexical’ and all changes can be logged with the SPACEBAR. At the end of each cut, the video pauses and you can specify what you have just seen, by replaying (F12) the latest event, giving you enough time to specify the focus and the visible people.
During the second pass, we use the coding Mode ‘Standard’, which works best with our mutually exclusive ‘Speaker’ codes. Because each ‘Speaker‘ is defined as a comment code, a special comment dialog appears at the end of each event, allowing us to do the transcriptions while we observe. Again, we can replay the latest event using F12, so we can enter all that has been said. During monologues, it is best to manually pause the video at the end of each sentence.
Import Emotion Data
First, we need to import the measured emotion values into Mangold DataView. In this case synchronization is easy, because the emotion values start at the very beginning of the video.
After successfully importing the emotions data and specifying a layout with a chart per person (to bundle the measured emotions for both participants), our setup looks like this:
Mangold DataView and Mangold INTERACT hookup automatically, based on the time information of the video and that of the measured values. Playing the video makes the emotion data run in-sync. Replaying any of the manually scored events automatically shows the corresponding emotion values in the charts.
Extract Emotional Events
Based on the measured values per emotion channel, we create INTERACT events from within DataView.
The corresponding scripting routine allows you to select a channel and a threshold value and all gap less, succeeding lines - in which the specified threshold is met - are joined into a single event.
This way, we can generate an event for all periods in which the ‘Happy’ channel for Mrs. Merkel rises above 0.4, same for Mr. Steinbrück of course.
Also of interest for our study, are the sequences where the ‘Anger’ channel of the participants rise above 0.4:
We can specify the Class and behavioral Code for each search separately and generating all of the sequences above only takes 4 separate runs of the same routine with adjusted settings.
All in all, this will take 5 minutes’ max!
Now we can compare the statistics on our manually coded ‘positive’ expressions and the automatically generated events for ‘Happy’ in the Full statistics.
We can also verify the expressions by replaying the events that were generated based on the measured values. Sorted by class will give us all events in a row, so we can jump through the video and watch those sequences easily.
Statistics tells us, that Mrs Merkel might be registered as happy quite often, but it usually lasts for less than half a second:
Statistics tells us, that although Steinbrück seems to be happy more often than Merkel both based on frequency and duration, Merkel is the one who has laugh the longest (max. duration).
Even without the fancy extraction of automatically detected emotions, the scoring of the times each speaker spoke, gives us quite some interesting information. The "Trim on code"-filter gives us the details who else spoke during Steinbrücks or Merkels speaking time:
The contingency analysis allows us to identify relations and latency between two separate events. It can be used to identify which moderator talked directly to what politician:
Results / Findings
At the end, Mangold INTERACT shows you a list of your code and corresponding frequency of your analysis.
The biggest advantage of Mangold INTERACT is its ability to re-organize, rename and shuffle collected data, should you notice that your initial structure is not perfect to find the answers you are looking for.