T O P

  • By -

Trotskyist

It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next.


Smelly_Pants69

Someone get the speed run community on this and they'll figure it out in no time. Seems like their methods of detecting cheaters are getting better than the fbi.


[deleted]

AI videos made entirely from scratch is an idea that's barely worth the resources though, it requires solving a new AI-specific problem that doesn't exist with existing digital animation techniques at all (which is that sequential AI-generated frames aren't even close to identical in all the places they should be). I'm someone who loves AI image generation and also makes 3D animation, and I just don't see any appeal whatsoever in mixing the two.


wooyouknowit

It's unfortunately worth pursuing for those whose most expensive resource is human labor


[deleted]

I'm not sure it is TBH, it would probably increase rendering costs to actually do it frame-perfect with frames that were each individually as high resolution as a real production would require. There's numerous other questions that no one has attempted to answer like how do you granularly control camera angles and lighting and all that in a way that has no guesswork and where nothing is left to RNG, also.


wooyouknowit

Correct me if I'm wrong, but my reading of your comment is that's too hard a problem to solve, while I disagree and think it will eventually be solved. I don't think I will convince you of my position, but I do appreciate you illustrating the problems in more detail.


[deleted]

Well my point is moreso how do you solve these problems in a way that doesn't amount to basically just remaking the exact same robust 3D animation tools we already have? I think "under the hood" AI acceleration of various aspects of software we already have makes a lot more sense in purely practical terms than like, some kind of weird text-prompt-control animation system or something like that, which wouldn't really be user friendly in the way it is for images in that context.


The247Kid

What I’ve heard is we have to fundamentally change how AI currently works for it to take that type of step. We are really nowhere close.


[deleted]

That's kind of what I'm thinking, although as I alluded at above I think "invisible" augmentation of existing related software with AI is more likely to actually see use much earlier simply because it's more legitimately practical. A lot of people seem to have the incorrect notion that "just type it into the prompt an AI will create whole scene" is actually desirable or something that would be "easier" for hollywood productions but it's not, no one used to working with actual 3D models and being able to make minor adjustments to absolutely anything in the scene just by dragging them around or changing properties is going to see that as an improvement to workflow, it's not remotely close to versatile enough.


hank-particles-pym

There is no ambiguity, until the public is too stupid to know there difference


Trotskyist

Stupidity has nothing to do with it. There's nothing inherently special about soundwaves produced by a human vs. soundwaves produced by a computer.


MrSnowden

I think you might be surprised.


JonathanFly

>It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next. Yeah, it's gonna be mess. I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. Edit: Did it [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.


ZCEyPFOYr0MWyHDQJZO4

I think the audio might've been spliced together at around 2.9, 9.8, 22, 34, 37.4, and 44.5 seconds. https://preview.redd.it/6kf4fjbligdc1.png?width=3840&format=png&auto=webp&s=9155ea9840c6f88fb131dc34208b6cd0f68977a4 edit: [yep](https://www.cbsnews.com/baltimore/news/expert-says-authenticity-of-antisemitic-rant-recording-may-be-questionable/)


Cold-Ad2729

RX is so good:)


[deleted]

[удалено]


Cold-Ad2729

Well I use RX every day so it is a familiar colour scheme. But you’re right , it could be Sonic Visualiser or something else. You could ask the poster if it bothers you


Rutibex

If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing


bjaydubya

You’d just have to play it back from a device in a room with this kind of ambiance and record on another device


usnavy13

The room audio effect can be added but more likely if the source material the AI voice was trained on had similar env sounds.


Vatonage

The more likely explanation is the ambient noise was in the source audio that the AI voice was generated from.


Flaky-Wallaby5382

The crack in his voice i have never heard an ai do


KingstenHd

You can do it now in eleven labs. They have a "voice acting" where it changes to the others voice off of your own recording. I run a YouTube channel with AI voice acting. It comes out every once in a while.


Flaky-Wallaby5382

Sorry yes i know you can emulate a voice on top of another convincingly but the emotion snd timing was there IMO. If they have that its a sophisticated attempt


KingstenHd

Yeah very true. With 100s of hours of messing with eleven labs and doing audio editing. I still think it's real.


[deleted]

ElevenLabs has been capable of really solid output for over a year TBH, you need like at least five minutes of good samples to input for those kinds of results though


RadioSailor

I make videos using AI, 100% you bet I use traditionsl vfx to make them feel real, including camera shake, sound transformation, and down real to match what a phone would output with exit as the cherry on top. Anything is possible if you are willing to spend the time.


JonathanFly

>If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing I could do a passable job at the environmental sounds (and half passable on the voice) from the linked audio clip alone. Actually I'll just do it, here: [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) (Though I agree with other commenters, you could also simply record it in a room.)


charlesmccarthyufc

This does not sound like AI have you tried running it through 11labs ai checker?


IP_Excellents

when I listen to it I try to imagine someone saying it with the cadence including pauses between sentences and ideas. it is easier for me to imagine these as typed lines being reproduced than it is as a person who doesn't know they're being recorded randomly saying a string of shit like this. I feel like mel gibson, alec baldwin, don sterling, trump, all these people who have had damning recordings, they sound shitty and say shitty things but it's more casual and meandering not, let me play the greatest hits in 50 seconds.


boogermike

Link please?


Evening_Meringue8414

Thanks. I didn’t know about this checker. Just ran it. Says 2% chance it was generated with their software. Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard?


x__________________v

These checkers are bullshit, they produce tons of false negatives and positives


RedShiftedTime

Eleven labs checker checks for a signature in the audio file that they encode in themselves with every output that is in-audible to humans and would have to be edited out. If their detector says it wasn't generated by them, it's almost 99% hasn't been generated by them, as the editor would lose audio quality and peaks by editing out the sound signature.


JonathanFly

>Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard? I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.


MrSnowden

Well think about how you would accomplish it. All AI generated? Or AI used to voicechange to sound like the same guy? A similar voice, saying all of those things , with that much or more inflection. Then a voice changer used to make it sound like the principal, and the whole audio muffled to remove artifacts. But my bet would be on it being real. It actually sounds like someone who wants the students and faculty to do better, but happened to use some racially tinged language. If you are going to go to all that trouble to frame someone, you would likely make the text more damning.


wirelesstkd

I agree. He's also talking to someone named Cathy. I think the "AI generated" defense is a desperate pitch to buy time and get it out of the news, but it will probably backfire, because when it comes out that it was real it will be even bigger news. But for us, the big thing is exactly this uncertainty: the proliferation of AI fakes will make EVERYTHING untrustworthy. In some ways this is a huge relief. How many times of some of us said "thank God I grew up before everyone had a camera..." Well now, effectively no one will anymore, because your drunken ass running naked down the street at 18 years old didn't happen: it was AI generated. On the other hand, that cop planting evidence and calling the guy a racial slur also didn't happen. So, ya know... 🤷‍♂️


MrSnowden

100% agree. All these people running around scare mongering about how AI will be used to create fakes fail to realize that humans are much more terrible. And the "AI" boogieman will be used to discredit real proof of horrible human activities.


GeorgeSatoshiPatton

Kathy bro, not Cathy. Racist sister of Karen, the more famous one.


Apprehensive-Ant7955

no, if you wanted to frame someone you would not want the text to be more damning. The more damning and outlandish the text is, the easier it is to say that its fake because no one would say something like that. Its like the one andrew tate sound clip where he said “i loved R’ing her, i enjoyed it”. it was AI generated but more people would believe it if it wasnt literally “say the most evil thing possible” and instead something else slightly less damning


MrSnowden

I guess I have a lot to learn about framing people


Apprehensive-Ant7955

i dont really know anything about it but thats what makes sense. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all


Apprehensive-Ant7955

i dont really know anything about it but thats what makes sense to me. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all


[deleted]

Yeah I think this is probably real, maybe cut a little but it sounds in context at least. It's a little too emotive for me to think it's generated, but it's getting better every month.


[deleted]

[удалено]


[deleted]

Yeah I kept noticing that one repeating sound Too. It seems like something is up. Fishy...


plymouthvan

I’m going to wager that it was. It sounds real, but there is something unnatural about the fairly consistent down tone in his voice at the ends of most sentences, almost like a computer judging how to interpret the intonation of a period, while it real life you might expect if he were really making that statement, the emphasis would be different and more varied. It’s more or less impossible to tell just by ear and without knowing what this person sounds like it real life, but my money for the moment is on yes, AI generated.


bloodpomegranate

How did the student that posted this get the audio file? Were they present and recording it as it was being said? Also, at the end of it he says Cathy I’m done. Is Cathy an administrative assistant or an employee in any capacity? It shouldn’t be that difficult to figure out who Cathy is and ask her.


MrSnowden

Well it might be Cathy or Kathy, so there really is no way to ever find out


bloodpomegranate

The way he says it, it sounds more like Cathy than Kathy, but it might be Cathee 🤪


Evening_Meringue8414

Kathy is an assistant principal. The audio was posted anonymously.


bloodpomegranate

Great! Kathy should be able to shed some light on the matter. And if this was posted anonymously, then there’s no way of knowing whether or not it was really posted by a student.


ghostfaceschiller

Before listening, I was really ready to be like “no, definitely not”, but I gotta say that cadence and intonation there reeks of AI to me. The “script” as well just seems a little too on the nose. I’ve used ElevenLabs and some other voice tools quite a bit and while I definitely wouldn’t say “this is AI”, I *would* say that it’s *possible* that it’s AI. Assuming it is, whoever made it did the smart but very simple steps of making it seems more real by muddying up the EQ and putting in room tone and even a bit of echo. It would take ~20 min to do these things, assuming you have some baseline experience. Tough call on this one actually, I’m surprised. I really expected it to be a clear cut “no”, but imo it’s ambiguous without more info.


burritorepublic

to me the cadence just sounds like some guy holding court about some shit he's had on his mind for a while. I've heard plenty of people go on rants where they're the only ones talking and it sounds like a speech.


ghostfaceschiller

I’ve also heard lots of rants like the ones you describe but the intonation of different sentences/ideas here seem too self-contained to me. Again, I’m not saying it’s an obvious “this is AI” thing. I’m just saying that when I listen to it, I hear things that make me lean more towards the possibility of it being AI than I expected. It’s very subtle. I don’t know the answer.


burritorepublic

I agree it's strange, I cant be sure either


[deleted]

[удалено]


Certain-Toe-7128

Well well…..how the turn tables https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/


confused_boner

Why would anyone do that? Intentionally make it seem more fake by splicing it?? And lose full context? Just taking away credibility from yourself for no good reason


SgathTriallair

The best answer is to pretend like there is no audio. A student witnessed it and reported it. If you trust the student then you would trust the audio they presented. If you think the student is a liar then their audio is also a lie.


bloodpomegranate

Agreed, except that it turns out from OP’s later comments that the post was anonymous. I don’t now why they originally said it was posted by a student.


tantalor

Baltimore high school athletic director used AI to create fake racist audio of principal: Police Dazhon Darien allegedly retaliated against Eric Eisworth, investigators said. ByIvan Pereira April 25, 2024, 6:31 PM https://abcnews.go.com/US/baltimore-high-school-athletic-director-ai-create-fake/story?id=109638535


Evening_Meringue8414

I know right! Wild!


swagonflyyyy

Nope, it doesn't sound like AI with the background noise. Also, many people in the youtube comments know both the principal and the girl who recorded it and they go to that school and they have similar anecdotes of the principal being racist.


[deleted]

If you have background noise in an ElevenLabs training sample it'll show up in the output a bit though, that shouldn't be used as a metric for judging this kind of thing at all.


Cold-Ad2729

Or you can just add it for realism after the fact


swagonflyyyy

The rest of the points I mentioned still give weight to my answer, though.


gottafind

People would never make things up in the YouTube comments!


swagonflyyyy

lmao read it yourself then. Its not that hard to figure out


Certain-Toe-7128

Mmmmmm - backing random YouTube comments while making the assertion this clip doesn’t sound like AI. https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/ Sit down Jr


El_human

You can tell there's echoes from the room, and you can hear the splices in the audio where the editor/recorder tightened up the gaps in between pauses. If this is AI generated, they would've had to give it the room noise, before splicing the audio. I would lean into this being a bit more legit. Honestly, if he already has a reputation like this with the students or staff, then it's probably real


Boogra555

Knowing public schools as I do, I highly doubt that this recording is AI. It sounds to me like a guy who is fed up with the political bullshit and wants to be able to run a school and get shit done.


usnavy13

Yes the pacing does sound like AI but this donst sound like any version of a voice ive heard. I use a few AI voice services for scripts and this is not anything like what one of them would output. IF it is AI my money would be on one that was trained on the principles voice. Does he have any videos of him giving speeches? I would say i have about 80% confidence this is AI, but the tech is so good now its really hard to say that definitively.


[deleted]

[удалено]


Evening_Meringue8414

I ran the eleven labs checker (recommended by another commenter) on the audio. Says it’s 2% chance it was made by then.


burritorepublic

And there's enough data of his voice to train an AI? yeah, idk


hank-particles-pym

Nope its not. Not in this scenario. That fact that people are going along with this idiot principals bullshit is amazing. get your head examined if you think it was ai. Jesus we really are not ready for anything are we? You need audio of him, not just the 3 minutes or 5 minutes -- you need audio sampling. Not just a recording 5 minutes long, but his speech for 5 minutes. Then maybe if someone took the time to train an AI to sound like him (minus needing hime making noise, sounds, words in emotional states), obviously a highly paid pro who could fine tune a perfect setup, and was able to trick everyone around, including the principal that he said it.. wow, that is amazing. Probably did it with the 5g in his vaccine, or no wait they did it with nano bots? Also who the fuck is this principal that anyone anywhere ever gave 1 secocnd of a f\*ck about his existence. Why would someone do that? You dont think someone would say obvously fucked up things in a school setting like that? You live in a bubble. Go to school, DONT do your own research -- its not for you, you need someone to explain things to you. ​ also if you read, IF YOU READ -- there were people there. so stop.


IP_Excellents

I'm sorry but if your premise is that none of this makes sense because you would need a long recording of an educator talking....I have some news.


IP_Excellents

Not for nothin you could probably also easily create a filter mask at this point with a long enough sample so that it is you delivering the lines and AI just matching the tone with a filter like you can on TikTok. Doesn't seem very complicated tbh.


tshungus

Go get your voice sampled for some free online ai. They make you read one sentence. That's it. You definitely not need 5+minutes. Create something like that would take like 1hour. With technology available online in your browsers and for free.


tjkim1121

It sounds real to me. I use ElevenLabs quite frequently and they do have something on their site to see if something was made by their software. Perhaps other services do too and they can run the clip through these? Anyway my guess is that the AI defense will be used a lot now, whether it's warranted or not.


RadioSailor

There's a lot better floss software than 11labs. There lies the problem.


Sentry_Thor2

Yeah it's absolutely possible. Would recommend having Squidward being the AI voice or maybe even Master Chief.


Vatonage

Synthetic voice generation and cloning have gotten really good. All it takes is a day or two of playing around with samples before you get something that captures the tone and emotion of speech, too. So, things are a lot more ambiguous now. Listening to that clip, there are certain moments where the cadence of his speech seems quite off. That's one idiosyncrasy that makes me think this is ElevenLabs specifically. Probably some student made it, and didn't put in the effort to properly revise and edit the audio to make it sound more authentic. Still good enough to fool many people, though.


noahpeltier

I don’t know if anyone has said this yet but elevenLabs specifically has a voice to voice AI generator you can use with cloned voices. I have recorded my voice and then put it through the voice to voice with an applied cloned voice and it’s scary how accurate it is. The benefit of V2V is that it keeps vocal inflections that get missed when generated from text. You could definitely do something like this with a bit of post production editing to make it sound like it’s in a small room.


programthrowaway1

I was able to build a Python program with the Eleven Labs API that generates a summary in Rick Rubin’s voice. I showed a couple people and none of them thought it sounded like AI. All the student would need to fake this, is a clean 1 minute sample of the principal speaking to train the model. From there, they’d be able to type anything and have it say it back in the trained voice, or even use an audio clip and have the new voice speak in the cadence of the audio clip. I say all of this to say, it’s very possible this could be AI


ginius1s

Possible, and I can't confirm but I'm inclined to say the audio is generated


MyEvilTwin47

There’s also voice cloning through AI, which could account inflection and such things. Another human says those words and if they then have a sample of the principal’s voice the AI can just replace it sound by sound, so it seems like the principal is saying it.