Waveform Analysis Of Michael Moore’s Oscar Speech: Evidence That Audio Portion Of CNN Soundtrack Was Altered From Original ABC Broadcast?

29 Replies

Thanks to Tristan for creating these waveform comparisons of the original ABC feed and the CNN rebroadcast. Thanks so much for putting this up!
Some data extracted from Michael Moore’s speech, as transmitted on CNN and ABC

Some data extracted from Michael Moore’s speech, as transmitted on CNN and ABC
The audio files were downloaded from http://www.lisarein.com/michaelmoore/michaelmoorecompare.html. I cropped the most controversial ‘booh’ part in the two versions, when he says tells “…that elected a ficticious president…. we…”.
I compiled a stereo file with each version on each channel, submitted it to common analysis tools in a sound-editing program, and ended up with this (click on the images for a high resolution version)…
In the audio version, the stereo file with each version on each channel, you can clearly spot the difference between the two speeches. I let you hear where the booooohs come from. FYI, CNN’s channel is on the left.
The conclusion
is as always up to you

Here is the full text of the page in case the link goes bad:
http://asyo.com/michaelmooresspeech/
Some data extracted from Michael Moore’s speech, as transmitted on CNN and ABC
The audio files were downloaded from http://www.lisarein.com/michaelmoore/michaelmoorecompare.html. I cropped the most controversial ‘booh’ part in the two versions, when he says tells “…that elected a ficticious president…. we…”.
I compiled a stereo file with each version on each channel, submitted it to common analysis tools in a sound-editing program, and ended up with this (click on the images for a high resolution version) :
Spectrum:
moore’s speech ABC retransmission
Notice the strength of the horizontal curve of the (enthusiastic?) whoo.
moore’s speech CNN retransmission
Notice the strength of the boo’s (red lines), and how the whoo now swims in the background… The noise coming from the public is strangely louder under the two “booooo’s”
Waveform:
moore’s speech waveform, both channels
CNN’s signal is blue, ABC’s is red, the overlapping zone is dark.
Sound:
In the audio version, the stereo file with each version on each channel, you can clearly spot the difference between the two speeches. I let you hear where the booooohs come from. FYI, CNN’s channel is on the left.
The conclusion
is as always up to you

29 thoughts on “Waveform Analysis Of Michael Moore’s Oscar Speech: Evidence That Audio Portion Of CNN Soundtrack Was Altered From Original ABC Broadcast?”

Cool/Lame April 15, 2003 at 12:09 pm

Michael Moore

did cnn turn up the booing on michael moore’s speech
Richard April 15, 2003 at 1:47 pm

I am a broadcaster and have looked at the waveforms.
It is likely that the differences were causes by different amounts of audio compression applied to the broadcast signal by both broadcasters. Higher levels of audio compression will have the effect of lifting any background noise AND of lifting any audio IN-BETWEEN the dominant audio source. So even if the talking is louder than the (say for example) Booing, audio compression will lift the level (apparent volume) of the booing in the gpas between words. To the ear this has the same effect as if the booing was louder.
If more compression is applied by one station as compared with another then the background effects (clapping etc) will appear to be louder).
It is also likely that a news-based station like CNN has optimised their on-air sound for human speech. This would mean the any vocal range sounds in between words in a speech might possibly be boosted more than other sound such as applause which are essentially random noise.
The only way to tell for sure is to get a recording of the actual audio as it arrived at the two stations in questions.
If you want other examples of this effect all you have to do is listen to the same song accross a range of different FM stations – I think you’ll find that there are quite big differences in the apparent mix of the track – e.g. more bass, more trebble, more backing etc.
I am afraid that all (99%) broadcasters are doing this, and as such, comparisons off-air (especially when the differences are relatively subtle) should be done with great care.
Cheers.
Dustin April 15, 2003 at 4:57 pm

I am not a broadcaster, but I have worked in a recording studio a little, and it seems that there are a number of possible explanations. Richard’s, above, on compression is a good one, but given how subtle the difference are, and how many differences appear in the two spectrum’s outside of the actual booing part, it’s entirely possible that just the differences in equipment could have this effect. Anyone with any familiarity with sound equipment knows that two different soundboards, even of the same make and model, can sound very different. Different cables can have an effect, even differences in room temperature and humidity can effect the way different boards sound. And we’re talking not just about sound boards, but about two entirely different sets of broadcasting equipment, here. Plus, CNN likely remixed the sound when they prepared the clip for broadcast, and remixing brings in not only changes produced by the equipment, but also changes due to the way the mixer hears the track.
I know a lot of work has gone into analysing these sounds, but I just don’t see a very convincing argument for any nefarious, behind-the-scenes tampering. That’s not to say there’s no audible differences–I just don’t think the differences are purposeful.
Tristan April 15, 2003 at 7:18 pm

On a technical level, it clearly seems to be a matter of compression (the release time is often clearly perceivable, it confirms what you said, Richard).
The issue, for me, is the purposeful choice of mastering the sound in a such a particular way, in a broadcasted sequence with such a heavy political load.
I have uploaded some more data to feed the discussion:
http://www.asyo.com/michaelmooresspeech/ducttape_small.jpg
and
http://www.asyo.com/michaelmooresspeech/ducttape.mp3
I have to sleep, sorry for my flat english..
George April 15, 2003 at 10:10 pm

Tristan, first of all thanks for the good job. I find the stereo example with headphones quite interesting. There is problems with it as a clue for the way we localize sources, so for instance it does not make clear whether all ambience noise was elevated in the CNN example or just the single booer close to the apparent ambience mic.
It does however make very clear that the booer is significantly more audible in the CNN version over the ABC one. This is a strong and salient feature and is not at all negligible. This is based on what we know about the psychoacoustics of spatial hearing.
I’ve read a few theories as to what happened.
Hypothesis 1: Compression. Claim that compression amplifies background and silent portions.
Hypothesis 2: Equipment effects including humidity and temperature.
Hypothesis 3: Remixing. Presumably single-channel remixing using an equalizer.
Hypothesis 4: Mastering. Meaning multi-channel remixing using multichannel equalizers/mixers.
I would like to argue that none of these hypothesis but the last hold given the evidence:
1) If compression of the type described were responsible then we would hear amplified noise in quiet sections of the speech. In fact the cheering and booing subsides almost completely in the middle section of the speech and background noise is not audibly amplified in that section.
2) Equipment effects are usually spectral envelope type effects (usually compensated for using equalizers, btw) They affect all sound within the source equally. This is a spectrally mixed source and Moores voice would be equally affected. In the comparison it is obvious that Moore localized in the center, whereas he should with the booer if this were the case. A side-note: I find it highly unlikely that CNN had separate equipment there. The reasons are again based solely on the sound. The particular mix of ambience, speaker and dominant booer suggests very close if not equivalent mic positions. Microphone positions are indeed very sensitive, as the audio engineers here will be happy to attest.
3) Single-channel equalizing has the same properties as equipment effects, they are frequency-band manipulations that affect all aspects of the sound in that band and can’t separate sound sources. It would be practically impossible to mix the boer into one channel while keeping Moore in another as their spectra share the same frequency space.
4) Mastering. Assumption is that in fact both ABC and CNN had more than one mastered track at hand, but rather had independent channels for various mics. The minimum scenario would be two mics (speaker, ambience) though it could be more. In such a setting distance to mics (plus mic directionality) will determine the salience of a signal in a given channel. This can explain the heard effect fine.
Hence only separate mastering reasonably survives the given hypothesis.
One might still consider that signal manipulation of a single track might be at hand, but given the signals at hand this is very unlikely. Source separation, especially of the quality experienced her is a know hard problem that many scientists have chewed on for decades.
I wanted to see if I can shed light as to nature of the signals. These signals have gone through various equipment-type and digital manipulations, including VHS recording and playback, conversion to computer formats, likely “perceptual” compression and so forth. None of these manipulations are intended to change the signal but do have spectral shaping type effects making a direct comparison hard. Given all this it is especially surprising that the heard difference is so salient. Also both signals don’t sound distorted, meaning that the manipulation chain had only minor audible effects.
So I used basic signal processing to come up with a “difference” signal between the CNN and ABC track. This is meaningful because it would show if there is band-limited manipulation of the equalizer-type present.
The results are:
1) There are no equalizer-type band-limiting maniplulations clearly visible.
2) The CNN signal has been low-pass-filtered at 15kHz, which is a standard move to protect your dog from pain and us from aliasing artifacts. This may have happened during a later manipulation stage and not necessarily at CNN.
3) The booers voice is the most salient feature in the difference signal, however other background noise seems to be there too.
I’d be happy to provide the resulting difference file and a desciption of the procedure if requested.
Given all this, I find the following scenario most plausible. These two sound examples are different mic mixes. Where they have been mixed differently is unknown. It may have happened before CNN got the footage. This cannot be determined from the evidence.
A few other plausible assumptions I find peculiar. The only salient sound source of interest is the speaker. It is natural to mix the
speaker dominant over the ambience (this is what ABC did). In the ABC coverage the whole speech is fairly clearly understandable. In the CNN mix, the last portion of the speech is almost incomprehensible. If the intend of the mix is to make the speaker heard, the second mix is a poor mix. The second peculiarity is mic placement. To mic ambience one would place a mic at reasonable distance from the audience. (Who would want to hear one audience member chatting or snoring through the ceremony?) Yet one particular person booes in the audience are very salient in the ambience. This requires sufficient proximity to the microphone or a directional mic (which would not be used for ambience pick-up). This is a peculiarity for which I have no good explanation. It should be noted that the one booer is definitely not picked up through Moore’s mic which is very definitely highly directional in his direction and wouldn’t pick up such an audience response.
Why do I know all this? I do audio signal processing for a living in a research setting.
Anyway, I agree with Tristan, that most of this is not the point, but rather how in conjuction Aaron Brown chose how to present it. The fact that the CNN version happens to have louder booeing just facilitates. From the audio evidence only one booer can be discerned. The other audience reaction is perceptually quite ambiguous
and can be either cheer or boo. Needless to say I see in the video visual no cheering or booing but isolated smiles (Brown interprets as “bemusement”, this interpretation is not clear to me from the footage) and isolated clapping.
The problem is with the presentation of the footage by the host and not with the mix. The mix can be seen quite value-free. Or, of course, spun towards any interpretation one may feel politically comfortable with.
Steve April 16, 2003 at 4:33 pm

I have worked with digital audio for many years, and you couldn’t have a better digital audio broadcasting forum than the previous posts.
I agree that the booing seems louder in the CNN recording, but what is also clearer is that it is dominantly a single person. So to turn the conspiracy theory on its head, the CNN tape seems to make it sound more like the booing is coming mostly from a single person. After you listen to him on the CNN recording, you can go back to the ABC recording and pick him out more easily.
Comparing with my ears only, it is clear that the only difference between the tapes is processing differences, not the addition of new sound effects. There does seem to be a delay effect on the CNN recording and that combined with any compression/EQ related differences could cause a “fattening up” of the audio signal.
Ray April 16, 2003 at 11:00 pm

I don’t know much about this kind of equipment. However, I can follow the discussion, and what occurs to me is that it might be instructive to compare any other portions that CNN aired with ABC footage. See what kind of other inconsistencies exist (if any), and look at what they might have in common. I should note that I haven’t even heard the samples yet; they’re downloading as I type this. My default reaction is to be skeptical about CNN doctoring this audio, but stranger things have happened.
Loyal Citizen Victor April 18, 2003 at 11:49 am

In a nutshell: Moore got booed. Do all the waveform analysis and explaining and discussing and whatnot, but in the end: Michael Moore got booed.
jojo April 18, 2003 at 2:11 pm

He got booed… by (supposedly) one very loud person who dissented with his opinion.
It seems all to common these days to provide a few dissenting voices extraordinary coverage.
Case in point: the Feb. 14 peace march with 100-500k people (estimates, I was there and would put it at 350-400k) got all of 2 minutes of attention on TV, mostly focusing on the scuffle between 30-odd protesters and police.
A “counter” demonstration of 12 lonely souls got as much attention…
This is free speech for ya… distortion of facts by ‘carefully cropping’ the picture. Oh… did you see the pictures famous first statue-pullaroo? Impressive he. Until you see what really happened on http://www.informationclearinghouse.info
Lambs to the slaughter… that’s what we are…
Richard Bennett April 18, 2003 at 7:19 pm

ABC broadcast the Oscars live. The also made a tape of their broadcast, and released it to other networks. So the CNN broadcast is a copy, and the ABC broadcast is the original. Does a copy sound exactly like an original? Not usually.
How do we know that the person who captured the clips from ABC and CNN did so at the same base recording level? Obviously, we don’t.
Nonetheless, Moore is a pompous fool with a history of lying. As it turns out, CNN also has a history of lying (by suppressing news) to make Saddam Hussein happy.
And your point was?
(BTW, Oswald acted alone.)
simex April 19, 2003 at 12:04 am

I work as a studio engineer, i go to college for audio production, and I’ve been doing music production since 1995. i will tell you right now it is not compression, it’s not even multi-band compression, because you’ll notice the man booing and michael moore speaking occur mainly in the same frequency bands in order for the boo to sound that much louder you would notice a drastic drop in the volume of michael moore’s voice at the same time, which you don’t. i can also tell you it’s not a matter of EQ, because i can tell you just by listening, these two sources are EQ’ed almost indentical, if not identical. this is intentional, i assure you. if it were one of these “accidental” explanations you would notice other parts of the speech affected by this, but you don’t. I’ll show you guys, I will take both audio signals, invert one of them and mix them at unison gain. that way you will only hear what is different between the two signals, and my guess is you’ll only hear the guy booing. anyone got some ftp space i could upload this too?
simex April 19, 2003 at 1:43 am

can anyone come up with better quality samples?
simex April 19, 2003 at 2:49 am

hey audio people… I’m having trouble getting these two signals to cancel each other out? thoughts? here’s my exact process : first mix the left and right channel of each example at unison gain to produce a mono track of both channels. then invert one of the new mono tracks. then cut them so they start at exactly the same point, and I’m going down to the 1:1 zoom in soundforge. then mix them together while trying to find unity gain. i think there’s a few possibilities as to why its not working. slight difference in eq… tape flutter? i mean what are these mp3s originaly from? a VCR probably… i’m not sure i can complete this little experiment without higher quality samples. i mean, i got close, but you could still hear everything and the “boo” was only slightly louder, enough to convince me, but not enough to convince the untrained ear. kudos to anyone who can get it.
simex April 19, 2003 at 2:52 am

hey audio people… I’m having trouble getting these two signals to cancel each other out? thoughts? here’s my exact process : first mix the left and right channel of each example at unison gain to produce a mono track of both channels. then invert one of the new mono tracks. then cut them so they start at exactly the same point, and I’m going down to the 1:1 zoom in soundforge. then mix them together while trying to find unity gain. i think there’s a few possibilities as to why its not working. slight difference in eq… tape flutter? i mean what are these mp3s originaly from? a VCR probably… i’m not sure i can complete this little experiment without higher quality samples. i mean, i got close, but you could still hear everything and the “boo” was only slightly louder, enough to convince me, but not enough to convince the untrained ear. kudos to anyone who can get it.
Tristan April 19, 2003 at 8:49 am

Loyal Citizen Victor: Nobody forces you to think more that in a nutshell.
Simex: I tried exactly the same thing, with an accuracy of 1 sample, but I couldn’t do it… I think George got some results, I’m interested in listening to the samples.
Whatever, the difference is clear in the separated samples (listen to the duct-tape sample…), and also visually.
George April 20, 2003 at 8:44 am

Simex & Tristan, there is many reasons why a simple subtraction of samples will not work well at all. There are simply too many factors to explain in this forum but let me give it a try.
Simply put, two waveforms that look very different can sound identical and it has to do with how our ears work. If you zoom in a lot in both versions in segments when they are audibly not different at all (e.g. before Moore addresses the elections) the wave-forms on the large scale looks similar but is significantly different on the small scale. In a previous post I mentioned that I calculated a difference signal and described what I got out of it. This difference signal is calculated not on the wave-form shape but in a different representation which is preceptually more meaningful (and less sensitive to the manipulations that caused the inaudible difference of the signals). This representation is called the frequency or Fourier domain. The methodology is straight forward though still rather advanced. It’s beyond me to describe it in lay terms. Needless to say the method is a very well established and mature segment of science and has been used extensively for many years.
An experiment how this works that anyone can try: Record yourself speaking. Create an mp3 encoding of it, decode it and compare the recorded wave-form with the en/decoded one. They will look different but sound the same. The simple inverted difference signal between them will likely sound the same too! All this means is that our ear hears some things but not others. And perceptual difference and physical difference are two quite distinct things.
P.S. for those who want to try the difference procedure I used and know digital signal processing: Cut out comparable segments using a sound editor. Delay align using cross-correlation. Over a block (I used 4096 windows, 2048 overlap) calculate FFT both signals. Calculate amplitude spectrum (abs of the complex spectrum); this procedure abandons phase effects. Calculate the difference of amplitude spectra. Recreate symmetric spectrum (for causality). Inverse FFT. Repeat until the end of signal using overlap-add. In my example I used Hanning windows throughout. The resulting difference signal is reasonable but has artifacts due to the neglect of phase issues including sub-sample alignment and phase matching.
George April 20, 2003 at 11:21 am

Simex, look through my first post. The procedure you describe is very labor intensive. I find it very hard to believe that CNN or anyone would do that. Also given the audio data there is no good evidence for this. Early on I considered the option of a boo being “overdubbed” i.e. CNN putting in boes that abc does not have by simple cut and past procedures. This would be cheap and managable tampering procedure. However, given the signals at hand this turns out to not be the case either.
You assume that CNN got a stereo mix from ABC. It is not clear to me that this is true. In fact the CNN video credits AMPAS (“the academy”). It’s unknown what feeds both ABC and CNN got and if there is any intermediaries or where exactly the mic mixes were made.
simex April 20, 2003 at 1:35 pm

just speculating. it’s not TOO labor intensive, i think i could do it in about an hour. hey george, you seem pretty knowledgeable, what exactly do you do for a living?
Tristan April 20, 2003 at 4:46 pm

George, thanks for your posts.. they were really instructive and purposeful, as I encountered the same problem than simex.
Now, I think we need to know how the sound was processed, bit this time from officials, from people working at the TV channels, or at the awards. What kind of material was given to the different channels, etc etc.
The matter is not the weakness/strength of our analysis, but if people will want to trust it or not.
It is easy to claim that an analysis is wrong/nonprecise if you want to think it’s wrong. That’s not the same with facts or interviews.
g April 21, 2003 at 4:37 am

Has anyone given much attention to the interesting echo effect that occurs in the CNN recording?
My spectrum analysis seems to indicate that the echo in Mike’s voice appears only after the first round of booing. Is there a technical reason anyone can give for this?
A possible motive could be to introduce a slight quaver to the voice resulting in a less decisive speech as such?
Thoughts would be appreciated.
George April 21, 2003 at 6:43 am

I don’t hear any “echo effect”. Nor would I think that you could see it in the kind of audio spectra one would normaly look at.
Richard Bennett April 22, 2003 at 2:08 pm

Only an illiterate would say “the fictition of duct tape”; why did ABC modify the tape to make it sound like Moore said that?
Dave April 22, 2003 at 3:38 pm

My name is David Nystr
Richard Bennett April 23, 2003 at 2:33 am

Yes, David, the question is why would ABC depress the volume of the booing? What are they trying to hide? And when did they try to hide it?
We must get Hans Blix involved right away.
Ellison Horne April 23, 2003 at 9:44 am

Dave, thanks for your interest and response.
It would be well worth the effort if from this study we learn something about the variety of methods used by mass media to transmit images and sound which in turn influence how messages are crafted. The result may be a new set of standards and practices that protect the public from arbitrary institutional manipulation of source material. Who decides what gets modified for transmission and how is it conducted? What impact does that modification have on the resulting news report? What are the social implications of how the ripple effect is generated that influences mass opinion?
As the media move further into the digital age it will be increasingly important to keep the public aware of what this means and the stewardship necessary in developing future policies.
Onward and upward,
Ellison
Tim May 1, 2003 at 5:49 pm

I find this really interesting. But have any of you read the heavy criticism about MM and his film? I can’t remember the sites, but look it up, the film itself is incredibly deceptive and dishonest, cutting time and implying certain things that just aren’t true.
I think Michael Moore is a great propoganda maker, but an objective documentarian? Come on, who are we tyrying to kid here.
Dave May 30, 2003 at 6:30 am

This discussion is not about michael moores
“objectiveness”, neither about who is
anti-american or a “traitor”.
Its about the perversion of the thruth at this
perticular moment.
Moores film certainly isnt an ojective documentary, but its not told as if it were the
absolute thruth.
And I personly think moore left of some of the
really “juicy” stuff that the USA has done over
the years.
Maby you wonder why the US hasnt signed the treaty
for the war-tribunal court?
Probably because there are retired american officials that risk life sentences if that were to happen.
Among others Kissinger for things done in Indonesia during the cold war.
Dont know if hes still alive, but imagine the
propagandic loss for the US when (if) he gets convicted.
read more about it here:
http://www.etan.org/news/kissinger/ask.htm
Furthermore I would like to say this to the american people: I saw many interviews on hte streets after september 11:th, most of the people
were asking “why do they hate us?”.
American officials answerd this question with : “Thet hate our lifestyle”.
This is so absurd!
The avarage american seems to be unwilling to grasp the concept of the US as a former evil empire.
With me writing this some probably think im some crazy taleban wanting to kill americans.
Sorry to dissappoint you, Im not ..
Im just concerned what will happen next, Our neighbours Norway has alot of oil aswell.
Last but not least: Vote for gods sake!
Peace out!
/The PeaceKeeper
The PeaceKeeper May 30, 2003 at 7:12 am

This discussion is not about michael moores
“objectiveness”, neither about who is
anti-american or a “traitor”.
Its about the perversion of the thruth at this
perticular moment.
Moores film certainly isnt an ojective documentary, but its not told as if it were the
absolute thruth.
And I personly think moore left of some of the
really “juicy” stuff that the USA has done over
the years.
Maby you wonder why the US hasnt signed the treaty
for the war-tribunal court?
Probably because there are retired american officials that risk life sentences if that were to happen.
Among others Kissinger for things done in Indonesia during the cold war.
Dont know if hes still alive, but imagine the
propagandic loss for the US when (if) he gets convicted.
read more about it here:
http://www.etan.org/news/kissinger/ask.htm
Furthermore I would like to say this to the american people: I saw many interviews on hte streets after september 11:th, most of the people
were asking “why do they hate us?”.
American officials answerd this question with : “Thet hate our lifestyle”.
This is so absurd!
The avarage american seems to be unwilling to grasp the concept of the US as a former evil empire.
With me writing this some probably think im some crazy taleban wanting to kill americans.
Sorry to dissappoint you, Im not ..
Im just concerned what will happen next, Our neighbours Norway has alot of oil aswell.
Last but not least: Vote for gods sake!
Peace out!
/The PeaceKeeper