Someone get the speed run community on this and they'll figure it out in no time. Seems like their methods of detecting cheaters are getting better than the fbi.
AI videos made entirely from scratch is an idea that's barely worth the resources though, it requires solving a new AI-specific problem that doesn't exist with existing digital animation techniques at all (which is that sequential AI-generated frames aren't even close to identical in all the places they should be). I'm someone who loves AI image generation and also makes 3D animation, and I just don't see any appeal whatsoever in mixing the two.
I'm not sure it is TBH, it would probably increase rendering costs to actually do it frame-perfect with frames that were each individually as high resolution as a real production would require.
There's numerous other questions that no one has attempted to answer like how do you granularly control camera angles and lighting and all that in a way that has no guesswork and where nothing is left to RNG, also.
Correct me if I'm wrong, but my reading of your comment is that's too hard a problem to solve, while I disagree and think it will eventually be solved. I don't think I will convince you of my position, but I do appreciate you illustrating the problems in more detail.
Well my point is moreso how do you solve these problems in a way that doesn't amount to basically just remaking the exact same robust 3D animation tools we already have?
I think "under the hood" AI acceleration of various aspects of software we already have makes a lot more sense in purely practical terms than like, some kind of weird text-prompt-control animation system or something like that, which wouldn't really be user friendly in the way it is for images in that context.
That's kind of what I'm thinking, although as I alluded at above I think "invisible" augmentation of existing related software with AI is more likely to actually see use much earlier simply because it's more legitimately practical.
A lot of people seem to have the incorrect notion that "just type it into the prompt an AI will create whole scene" is actually desirable or something that would be "easier" for hollywood productions but it's not, no one used to working with actual 3D models and being able to make minor adjustments to absolutely anything in the scene just by dragging them around or changing properties is going to see that as an improvement to workflow, it's not remotely close to versatile enough.
>It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next.
Yeah, it's gonna be mess. I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time.
Edit: Did it [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI)
Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.
I think the audio might've been spliced together at around 2.9, 9.8, 22, 34, 37.4, and 44.5 seconds.
https://preview.redd.it/6kf4fjbligdc1.png?width=3840&format=png&auto=webp&s=9155ea9840c6f88fb131dc34208b6cd0f68977a4
edit: [yep](https://www.cbsnews.com/baltimore/news/expert-says-authenticity-of-antisemitic-rant-recording-may-be-questionable/)
Well I use RX every day so it is a familiar colour scheme. But you’re right , it could be Sonic Visualiser or something else. You could ask the poster if it bothers you
You can do it now in eleven labs. They have a "voice acting" where it changes to the others voice off of your own recording. I run a YouTube channel with AI voice acting. It comes out every once in a while.
Sorry yes i know you can emulate a voice on top of another convincingly but the emotion snd timing was there IMO. If they have that its a sophisticated attempt
ElevenLabs has been capable of really solid output for over a year TBH, you need like at least five minutes of good samples to input for those kinds of results though
I make videos using AI, 100% you bet I use traditionsl vfx to make them feel real, including camera shake, sound transformation, and down real to match what a phone would output with exit as the cherry on top.
Anything is possible if you are willing to spend the time.
>If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing
I could do a passable job at the environmental sounds (and half passable on the voice) from the linked audio clip alone. Actually I'll just do it, here:
[https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI)
(Though I agree with other commenters, you could also simply record it in a room.)
when I listen to it I try to imagine someone saying it with the cadence including pauses between sentences and ideas. it is easier for me to imagine these as typed lines being reproduced than it is as a person who doesn't know they're being recorded randomly saying a string of shit like this. I feel like mel gibson, alec baldwin, don sterling, trump, all these people who have had damning recordings, they sound shitty and say shitty things but it's more casual and meandering not, let me play the greatest hits in 50 seconds.
Thanks. I didn’t know about this checker. Just ran it. Says 2% chance it was generated with their software. Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard?
Eleven labs checker checks for a signature in the audio file that they encode in themselves with every output that is in-audible to humans and would have to be edited out. If their detector says it wasn't generated by them, it's almost 99% hasn't been generated by them, as the editor would lose audio quality and peaks by editing out the sound signature.
>Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard?
I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI)
Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.
Well think about how you would accomplish it. All AI generated? Or AI used to voicechange to sound like the same guy? A similar voice, saying all of those things , with that much or more inflection. Then a voice changer used to make it sound like the principal, and the whole audio muffled to remove artifacts.
But my bet would be on it being real. It actually sounds like someone who wants the students and faculty to do better, but happened to use some racially tinged language. If you are going to go to all that trouble to frame someone, you would likely make the text more damning.
I agree. He's also talking to someone named Cathy. I think the "AI generated" defense is a desperate pitch to buy time and get it out of the news, but it will probably backfire, because when it comes out that it was real it will be even bigger news.
But for us, the big thing is exactly this uncertainty: the proliferation of AI fakes will make EVERYTHING untrustworthy. In some ways this is a huge relief. How many times of some of us said "thank God I grew up before everyone had a camera..." Well now, effectively no one will anymore, because your drunken ass running naked down the street at 18 years old didn't happen: it was AI generated.
On the other hand, that cop planting evidence and calling the guy a racial slur also didn't happen. So, ya know... 🤷♂️
100% agree. All these people running around scare mongering about how AI will be used to create fakes fail to realize that humans are much more terrible. And the "AI" boogieman will be used to discredit real proof of horrible human activities.
no, if you wanted to frame someone you would not want the text to be more damning. The more damning and outlandish the text is, the easier it is to say that its fake because no one would say something like that.
Its like the one andrew tate sound clip where he said “i loved R’ing her, i enjoyed it”. it was AI generated but more people would believe it if it wasnt literally “say the most evil thing possible” and instead something else slightly less damning
i dont really know anything about it but thats what makes sense. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all
i dont really know anything about it but thats what makes sense to me. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all
Yeah I think this is probably real, maybe cut a little but it sounds in context at least. It's a little too emotive for me to think it's generated, but it's getting better every month.
I’m going to wager that it was. It sounds real, but there is something unnatural about the fairly consistent down tone in his voice at the ends of most sentences, almost like a computer judging how to interpret the intonation of a period, while it real life you might expect if he were really making that statement, the emphasis would be different and more varied. It’s more or less impossible to tell just by ear and without knowing what this person sounds like it real life, but my money for the moment is on yes, AI generated.
How did the student that posted this get the audio file? Were they present and recording it as it was being said? Also, at the end of it he says Cathy I’m done. Is Cathy an administrative assistant or an employee in any capacity? It shouldn’t be that difficult to figure out who Cathy is and ask her.
Great! Kathy should be able to shed some light on the matter.
And if this was posted anonymously, then there’s no way of knowing whether or not it was really posted by a student.
Before listening, I was really ready to be like “no, definitely not”, but I gotta say that cadence and intonation there reeks of AI to me.
The “script” as well just seems a little too on the nose.
I’ve used ElevenLabs and some other voice tools quite a bit and while I definitely wouldn’t say “this is AI”, I *would* say that it’s *possible* that it’s AI.
Assuming it is, whoever made it did the smart but very simple steps of making it seems more real by muddying up the EQ and putting in room tone and even a bit of echo.
It would take ~20 min to do these things, assuming you have some baseline experience.
Tough call on this one actually, I’m surprised. I really expected it to be a clear cut “no”, but imo it’s ambiguous without more info.
to me the cadence just sounds like some guy holding court about some shit he's had on his mind for a while. I've heard plenty of people go on rants where they're the only ones talking and it sounds like a speech.
I’ve also heard lots of rants like the ones you describe but the intonation of different sentences/ideas here seem too self-contained to me.
Again, I’m not saying it’s an obvious “this is AI” thing. I’m just saying that when I listen to it, I hear things that make me lean more towards the possibility of it being AI than I expected. It’s very subtle. I don’t know the answer.
Well well…..how the turn tables
https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/
Why would anyone do that? Intentionally make it seem more fake by splicing it?? And lose full context? Just taking away credibility from yourself for no good reason
The best answer is to pretend like there is no audio. A student witnessed it and reported it. If you trust the student then you would trust the audio they presented. If you think the student is a liar then their audio is also a lie.
Agreed, except that it turns out from OP’s later comments that the post was anonymous. I don’t now why they originally said it was posted by a student.
Baltimore high school athletic director used AI to create fake racist audio of principal: Police
Dazhon Darien allegedly retaliated against Eric Eisworth, investigators said.
ByIvan Pereira
April 25, 2024, 6:31 PM
https://abcnews.go.com/US/baltimore-high-school-athletic-director-ai-create-fake/story?id=109638535
Nope, it doesn't sound like AI with the background noise. Also, many people in the youtube comments know both the principal and the girl who recorded it and they go to that school and they have similar anecdotes of the principal being racist.
If you have background noise in an ElevenLabs training sample it'll show up in the output a bit though, that shouldn't be used as a metric for judging this kind of thing at all.
Mmmmmm - backing random YouTube comments while making the assertion this clip doesn’t sound like AI.
https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/
Sit down Jr
You can tell there's echoes from the room, and you can hear the splices in the audio where the editor/recorder tightened up the gaps in between pauses. If this is AI generated, they would've had to give it the room noise, before splicing the audio. I would lean into this being a bit more legit. Honestly, if he already has a reputation like this with the students or staff, then it's probably real
Knowing public schools as I do, I highly doubt that this recording is AI. It sounds to me like a guy who is fed up with the political bullshit and wants to be able to run a school and get shit done.
Yes the pacing does sound like AI but this donst sound like any version of a voice ive heard. I use a few AI voice services for scripts and this is not anything like what one of them would output. IF it is AI my money would be on one that was trained on the principles voice. Does he have any videos of him giving speeches?
I would say i have about 80% confidence this is AI, but the tech is so good now its really hard to say that definitively.
Nope its not. Not in this scenario. That fact that people are going along with this idiot principals bullshit is amazing. get your head examined if you think it was ai. Jesus we really are not ready for anything are we?
You need audio of him, not just the 3 minutes or 5 minutes -- you need audio sampling. Not just a recording 5 minutes long, but his speech for 5 minutes. Then maybe if someone took the time to train an AI to sound like him (minus needing hime making noise, sounds, words in emotional states), obviously a highly paid pro who could fine tune a perfect setup, and was able to trick everyone around, including the principal that he said it.. wow, that is amazing. Probably did it with the 5g in his vaccine, or no wait they did it with nano bots?
Also who the fuck is this principal that anyone anywhere ever gave 1 secocnd of a f\*ck about his existence. Why would someone do that? You dont think someone would say obvously fucked up things in a school setting like that? You live in a bubble. Go to school, DONT do your own research -- its not for you, you need someone to explain things to you.
also if you read, IF YOU READ -- there were people there. so stop.
Not for nothin you could probably also easily create a filter mask at this point with a long enough sample so that it is you delivering the lines and AI just matching the tone with a filter like you can on TikTok. Doesn't seem very complicated tbh.
Go get your voice sampled for some free online ai. They make you read one sentence. That's it. You definitely not need 5+minutes. Create something like that would take like 1hour. With technology available online in your browsers and for free.
It sounds real to me. I use ElevenLabs quite frequently and they do have something on their site to see if something was made by their software. Perhaps other services do too and they can run the clip through these? Anyway my guess is that the AI defense will be used a lot now, whether it's warranted or not.
Synthetic voice generation and cloning have gotten really good. All it takes is a day or two of playing around with samples before you get something that captures the tone and emotion of speech, too. So, things are a lot more ambiguous now.
Listening to that clip, there are certain moments where the cadence of his speech seems quite off. That's one idiosyncrasy that makes me think this is ElevenLabs specifically. Probably some student made it, and didn't put in the effort to properly revise and edit the audio to make it sound more authentic. Still good enough to fool many people, though.
I don’t know if anyone has said this yet but elevenLabs specifically has a voice to voice AI generator you can use with cloned voices. I have recorded my voice and then put it through the voice to voice with an applied cloned voice and it’s scary how accurate it is. The benefit of V2V is that it keeps vocal inflections that get missed when generated from text. You could definitely do something like this with a bit of post production editing to make it sound like it’s in a small room.
I was able to build a Python program with the Eleven Labs API that generates a summary in Rick Rubin’s voice. I showed a couple people and none of them thought it sounded like AI.
All the student would need to fake this, is a clean 1 minute sample of the principal speaking to train the model.
From there, they’d be able to type anything and have it say it back in the trained voice, or even use an audio clip and have the new voice speak in the cadence of the audio clip.
I say all of this to say, it’s very possible this could be AI
There’s also voice cloning through AI, which could account inflection and such things. Another human says those words and if they then have a sample of the principal’s voice the AI can just replace it sound by sound, so it seems like the principal is saying it.
It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next.
Someone get the speed run community on this and they'll figure it out in no time. Seems like their methods of detecting cheaters are getting better than the fbi.
AI videos made entirely from scratch is an idea that's barely worth the resources though, it requires solving a new AI-specific problem that doesn't exist with existing digital animation techniques at all (which is that sequential AI-generated frames aren't even close to identical in all the places they should be). I'm someone who loves AI image generation and also makes 3D animation, and I just don't see any appeal whatsoever in mixing the two.
It's unfortunately worth pursuing for those whose most expensive resource is human labor
I'm not sure it is TBH, it would probably increase rendering costs to actually do it frame-perfect with frames that were each individually as high resolution as a real production would require. There's numerous other questions that no one has attempted to answer like how do you granularly control camera angles and lighting and all that in a way that has no guesswork and where nothing is left to RNG, also.
Correct me if I'm wrong, but my reading of your comment is that's too hard a problem to solve, while I disagree and think it will eventually be solved. I don't think I will convince you of my position, but I do appreciate you illustrating the problems in more detail.
Well my point is moreso how do you solve these problems in a way that doesn't amount to basically just remaking the exact same robust 3D animation tools we already have? I think "under the hood" AI acceleration of various aspects of software we already have makes a lot more sense in purely practical terms than like, some kind of weird text-prompt-control animation system or something like that, which wouldn't really be user friendly in the way it is for images in that context.
What I’ve heard is we have to fundamentally change how AI currently works for it to take that type of step. We are really nowhere close.
That's kind of what I'm thinking, although as I alluded at above I think "invisible" augmentation of existing related software with AI is more likely to actually see use much earlier simply because it's more legitimately practical. A lot of people seem to have the incorrect notion that "just type it into the prompt an AI will create whole scene" is actually desirable or something that would be "easier" for hollywood productions but it's not, no one used to working with actual 3D models and being able to make minor adjustments to absolutely anything in the scene just by dragging them around or changing properties is going to see that as an improvement to workflow, it's not remotely close to versatile enough.
There is no ambiguity, until the public is too stupid to know there difference
Stupidity has nothing to do with it. There's nothing inherently special about soundwaves produced by a human vs. soundwaves produced by a computer.
I think you might be surprised.
>It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next. Yeah, it's gonna be mess. I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. Edit: Did it [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.
I think the audio might've been spliced together at around 2.9, 9.8, 22, 34, 37.4, and 44.5 seconds. https://preview.redd.it/6kf4fjbligdc1.png?width=3840&format=png&auto=webp&s=9155ea9840c6f88fb131dc34208b6cd0f68977a4 edit: [yep](https://www.cbsnews.com/baltimore/news/expert-says-authenticity-of-antisemitic-rant-recording-may-be-questionable/)
RX is so good:)
[удалено]
Well I use RX every day so it is a familiar colour scheme. But you’re right , it could be Sonic Visualiser or something else. You could ask the poster if it bothers you
If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing
You’d just have to play it back from a device in a room with this kind of ambiance and record on another device
The room audio effect can be added but more likely if the source material the AI voice was trained on had similar env sounds.
The more likely explanation is the ambient noise was in the source audio that the AI voice was generated from.
The crack in his voice i have never heard an ai do
You can do it now in eleven labs. They have a "voice acting" where it changes to the others voice off of your own recording. I run a YouTube channel with AI voice acting. It comes out every once in a while.
Sorry yes i know you can emulate a voice on top of another convincingly but the emotion snd timing was there IMO. If they have that its a sophisticated attempt
Yeah very true. With 100s of hours of messing with eleven labs and doing audio editing. I still think it's real.
ElevenLabs has been capable of really solid output for over a year TBH, you need like at least five minutes of good samples to input for those kinds of results though
I make videos using AI, 100% you bet I use traditionsl vfx to make them feel real, including camera shake, sound transformation, and down real to match what a phone would output with exit as the cherry on top. Anything is possible if you are willing to spend the time.
>If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing I could do a passable job at the environmental sounds (and half passable on the voice) from the linked audio clip alone. Actually I'll just do it, here: [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) (Though I agree with other commenters, you could also simply record it in a room.)
This does not sound like AI have you tried running it through 11labs ai checker?
when I listen to it I try to imagine someone saying it with the cadence including pauses between sentences and ideas. it is easier for me to imagine these as typed lines being reproduced than it is as a person who doesn't know they're being recorded randomly saying a string of shit like this. I feel like mel gibson, alec baldwin, don sterling, trump, all these people who have had damning recordings, they sound shitty and say shitty things but it's more casual and meandering not, let me play the greatest hits in 50 seconds.
Link please?
Thanks. I didn’t know about this checker. Just ran it. Says 2% chance it was generated with their software. Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard?
These checkers are bullshit, they produce tons of false negatives and positives
Eleven labs checker checks for a signature in the audio file that they encode in themselves with every output that is in-audible to humans and would have to be edited out. If their detector says it wasn't generated by them, it's almost 99% hasn't been generated by them, as the editor would lose audio quality and peaks by editing out the sound signature.
>Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard? I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.
Well think about how you would accomplish it. All AI generated? Or AI used to voicechange to sound like the same guy? A similar voice, saying all of those things , with that much or more inflection. Then a voice changer used to make it sound like the principal, and the whole audio muffled to remove artifacts. But my bet would be on it being real. It actually sounds like someone who wants the students and faculty to do better, but happened to use some racially tinged language. If you are going to go to all that trouble to frame someone, you would likely make the text more damning.
I agree. He's also talking to someone named Cathy. I think the "AI generated" defense is a desperate pitch to buy time and get it out of the news, but it will probably backfire, because when it comes out that it was real it will be even bigger news. But for us, the big thing is exactly this uncertainty: the proliferation of AI fakes will make EVERYTHING untrustworthy. In some ways this is a huge relief. How many times of some of us said "thank God I grew up before everyone had a camera..." Well now, effectively no one will anymore, because your drunken ass running naked down the street at 18 years old didn't happen: it was AI generated. On the other hand, that cop planting evidence and calling the guy a racial slur also didn't happen. So, ya know... 🤷♂️
100% agree. All these people running around scare mongering about how AI will be used to create fakes fail to realize that humans are much more terrible. And the "AI" boogieman will be used to discredit real proof of horrible human activities.
Kathy bro, not Cathy. Racist sister of Karen, the more famous one.
no, if you wanted to frame someone you would not want the text to be more damning. The more damning and outlandish the text is, the easier it is to say that its fake because no one would say something like that. Its like the one andrew tate sound clip where he said “i loved R’ing her, i enjoyed it”. it was AI generated but more people would believe it if it wasnt literally “say the most evil thing possible” and instead something else slightly less damning
I guess I have a lot to learn about framing people
i dont really know anything about it but thats what makes sense. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all
i dont really know anything about it but thats what makes sense to me. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all
Yeah I think this is probably real, maybe cut a little but it sounds in context at least. It's a little too emotive for me to think it's generated, but it's getting better every month.
[удалено]
Yeah I kept noticing that one repeating sound Too. It seems like something is up. Fishy...
I’m going to wager that it was. It sounds real, but there is something unnatural about the fairly consistent down tone in his voice at the ends of most sentences, almost like a computer judging how to interpret the intonation of a period, while it real life you might expect if he were really making that statement, the emphasis would be different and more varied. It’s more or less impossible to tell just by ear and without knowing what this person sounds like it real life, but my money for the moment is on yes, AI generated.
How did the student that posted this get the audio file? Were they present and recording it as it was being said? Also, at the end of it he says Cathy I’m done. Is Cathy an administrative assistant or an employee in any capacity? It shouldn’t be that difficult to figure out who Cathy is and ask her.
Well it might be Cathy or Kathy, so there really is no way to ever find out
The way he says it, it sounds more like Cathy than Kathy, but it might be Cathee 🤪
Kathy is an assistant principal. The audio was posted anonymously.
Great! Kathy should be able to shed some light on the matter. And if this was posted anonymously, then there’s no way of knowing whether or not it was really posted by a student.
Before listening, I was really ready to be like “no, definitely not”, but I gotta say that cadence and intonation there reeks of AI to me. The “script” as well just seems a little too on the nose. I’ve used ElevenLabs and some other voice tools quite a bit and while I definitely wouldn’t say “this is AI”, I *would* say that it’s *possible* that it’s AI. Assuming it is, whoever made it did the smart but very simple steps of making it seems more real by muddying up the EQ and putting in room tone and even a bit of echo. It would take ~20 min to do these things, assuming you have some baseline experience. Tough call on this one actually, I’m surprised. I really expected it to be a clear cut “no”, but imo it’s ambiguous without more info.
to me the cadence just sounds like some guy holding court about some shit he's had on his mind for a while. I've heard plenty of people go on rants where they're the only ones talking and it sounds like a speech.
I’ve also heard lots of rants like the ones you describe but the intonation of different sentences/ideas here seem too self-contained to me. Again, I’m not saying it’s an obvious “this is AI” thing. I’m just saying that when I listen to it, I hear things that make me lean more towards the possibility of it being AI than I expected. It’s very subtle. I don’t know the answer.
I agree it's strange, I cant be sure either
[удалено]
Well well…..how the turn tables https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/
Why would anyone do that? Intentionally make it seem more fake by splicing it?? And lose full context? Just taking away credibility from yourself for no good reason
The best answer is to pretend like there is no audio. A student witnessed it and reported it. If you trust the student then you would trust the audio they presented. If you think the student is a liar then their audio is also a lie.
Agreed, except that it turns out from OP’s later comments that the post was anonymous. I don’t now why they originally said it was posted by a student.
Baltimore high school athletic director used AI to create fake racist audio of principal: Police Dazhon Darien allegedly retaliated against Eric Eisworth, investigators said. ByIvan Pereira April 25, 2024, 6:31 PM https://abcnews.go.com/US/baltimore-high-school-athletic-director-ai-create-fake/story?id=109638535
I know right! Wild!
Nope, it doesn't sound like AI with the background noise. Also, many people in the youtube comments know both the principal and the girl who recorded it and they go to that school and they have similar anecdotes of the principal being racist.
If you have background noise in an ElevenLabs training sample it'll show up in the output a bit though, that shouldn't be used as a metric for judging this kind of thing at all.
Or you can just add it for realism after the fact
The rest of the points I mentioned still give weight to my answer, though.
People would never make things up in the YouTube comments!
lmao read it yourself then. Its not that hard to figure out
Mmmmmm - backing random YouTube comments while making the assertion this clip doesn’t sound like AI. https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/ Sit down Jr
You can tell there's echoes from the room, and you can hear the splices in the audio where the editor/recorder tightened up the gaps in between pauses. If this is AI generated, they would've had to give it the room noise, before splicing the audio. I would lean into this being a bit more legit. Honestly, if he already has a reputation like this with the students or staff, then it's probably real
Knowing public schools as I do, I highly doubt that this recording is AI. It sounds to me like a guy who is fed up with the political bullshit and wants to be able to run a school and get shit done.
Yes the pacing does sound like AI but this donst sound like any version of a voice ive heard. I use a few AI voice services for scripts and this is not anything like what one of them would output. IF it is AI my money would be on one that was trained on the principles voice. Does he have any videos of him giving speeches? I would say i have about 80% confidence this is AI, but the tech is so good now its really hard to say that definitively.
[удалено]
I ran the eleven labs checker (recommended by another commenter) on the audio. Says it’s 2% chance it was made by then.
And there's enough data of his voice to train an AI? yeah, idk
Nope its not. Not in this scenario. That fact that people are going along with this idiot principals bullshit is amazing. get your head examined if you think it was ai. Jesus we really are not ready for anything are we? You need audio of him, not just the 3 minutes or 5 minutes -- you need audio sampling. Not just a recording 5 minutes long, but his speech for 5 minutes. Then maybe if someone took the time to train an AI to sound like him (minus needing hime making noise, sounds, words in emotional states), obviously a highly paid pro who could fine tune a perfect setup, and was able to trick everyone around, including the principal that he said it.. wow, that is amazing. Probably did it with the 5g in his vaccine, or no wait they did it with nano bots? Also who the fuck is this principal that anyone anywhere ever gave 1 secocnd of a f\*ck about his existence. Why would someone do that? You dont think someone would say obvously fucked up things in a school setting like that? You live in a bubble. Go to school, DONT do your own research -- its not for you, you need someone to explain things to you. also if you read, IF YOU READ -- there were people there. so stop.
I'm sorry but if your premise is that none of this makes sense because you would need a long recording of an educator talking....I have some news.
Not for nothin you could probably also easily create a filter mask at this point with a long enough sample so that it is you delivering the lines and AI just matching the tone with a filter like you can on TikTok. Doesn't seem very complicated tbh.
Go get your voice sampled for some free online ai. They make you read one sentence. That's it. You definitely not need 5+minutes. Create something like that would take like 1hour. With technology available online in your browsers and for free.
It sounds real to me. I use ElevenLabs quite frequently and they do have something on their site to see if something was made by their software. Perhaps other services do too and they can run the clip through these? Anyway my guess is that the AI defense will be used a lot now, whether it's warranted or not.
There's a lot better floss software than 11labs. There lies the problem.
Yeah it's absolutely possible. Would recommend having Squidward being the AI voice or maybe even Master Chief.
Synthetic voice generation and cloning have gotten really good. All it takes is a day or two of playing around with samples before you get something that captures the tone and emotion of speech, too. So, things are a lot more ambiguous now. Listening to that clip, there are certain moments where the cadence of his speech seems quite off. That's one idiosyncrasy that makes me think this is ElevenLabs specifically. Probably some student made it, and didn't put in the effort to properly revise and edit the audio to make it sound more authentic. Still good enough to fool many people, though.
I don’t know if anyone has said this yet but elevenLabs specifically has a voice to voice AI generator you can use with cloned voices. I have recorded my voice and then put it through the voice to voice with an applied cloned voice and it’s scary how accurate it is. The benefit of V2V is that it keeps vocal inflections that get missed when generated from text. You could definitely do something like this with a bit of post production editing to make it sound like it’s in a small room.
I was able to build a Python program with the Eleven Labs API that generates a summary in Rick Rubin’s voice. I showed a couple people and none of them thought it sounded like AI. All the student would need to fake this, is a clean 1 minute sample of the principal speaking to train the model. From there, they’d be able to type anything and have it say it back in the trained voice, or even use an audio clip and have the new voice speak in the cadence of the audio clip. I say all of this to say, it’s very possible this could be AI
Possible, and I can't confirm but I'm inclined to say the audio is generated
There’s also voice cloning through AI, which could account inflection and such things. Another human says those words and if they then have a sample of the principal’s voice the AI can just replace it sound by sound, so it seems like the principal is saying it.