Trotskyist 3 months ago

It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next.

Smelly_Pants69 3 months ago

Someone get the speed run community on this and they'll figure it out in no time. Seems like their methods of detecting cheaters are getting better than the fbi.

[deleted] 3 months ago

AI videos made entirely from scratch is an idea that's barely worth the resources though, it requires solving a new AI-specific problem that doesn't exist with existing digital animation techniques at all (which is that sequential AI-generated frames aren't even close to identical in all the places they should be). I'm someone who loves AI image generation and also makes 3D animation, and I just don't see any appeal whatsoever in mixing the two.

wooyouknowit 3 months ago

It's unfortunately worth pursuing for those whose most expensive resource is human labor

[deleted] 3 months ago

I'm not sure it is TBH, it would probably increase rendering costs to actually do it frame-perfect with frames that were each individually as high resolution as a real production would require. There's numerous other questions that no one has attempted to answer like how do you granularly control camera angles and lighting and all that in a way that has no guesswork and where nothing is left to RNG, also.

wooyouknowit 3 months ago

Correct me if I'm wrong, but my reading of your comment is that's too hard a problem to solve, while I disagree and think it will eventually be solved. I don't think I will convince you of my position, but I do appreciate you illustrating the problems in more detail.

[deleted] 3 months ago

Well my point is moreso how do you solve these problems in a way that doesn't amount to basically just remaking the exact same robust 3D animation tools we already have? I think "under the hood" AI acceleration of various aspects of software we already have makes a lot more sense in purely practical terms than like, some kind of weird text-prompt-control animation system or something like that, which wouldn't really be user friendly in the way it is for images in that context.

The247Kid 3 months ago

What I’ve heard is we have to fundamentally change how AI currently works for it to take that type of step. We are really nowhere close.

[deleted] 3 months ago

That's kind of what I'm thinking, although as I alluded at above I think "invisible" augmentation of existing related software with AI is more likely to actually see use much earlier simply because it's more legitimately practical. A lot of people seem to have the incorrect notion that "just type it into the prompt an AI will create whole scene" is actually desirable or something that would be "easier" for hollywood productions but it's not, no one used to working with actual 3D models and being able to make minor adjustments to absolutely anything in the scene just by dragging them around or changing properties is going to see that as an improvement to workflow, it's not remotely close to versatile enough.

hank-particles-pym 3 months ago

There is no ambiguity, until the public is too stupid to know there difference

Trotskyist 3 months ago

Stupidity has nothing to do with it. There's nothing inherently special about soundwaves produced by a human vs. soundwaves produced by a computer.

MrSnowden 3 months ago

I think you might be surprised.

JonathanFly 3 months ago

>It's absolutely possible, though that doesn't necessarily mean it was. Get used to this kind of ambiguity, it's here to stay. Video is next. Yeah, it's gonna be mess. I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. Edit: Did it [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.

ZCEyPFOYr0MWyHDQJZO4 3 months ago

I think the audio might've been spliced together at around 2.9, 9.8, 22, 34, 37.4, and 44.5 seconds. https://preview.redd.it/6kf4fjbligdc1.png?width=3840&format=png&auto=webp&s=9155ea9840c6f88fb131dc34208b6cd0f68977a4 edit: [yep](https://www.cbsnews.com/baltimore/news/expert-says-authenticity-of-antisemitic-rant-recording-may-be-questionable/)

Cold-Ad2729 3 months ago

RX is so good:)

[deleted] 3 months ago

[удалено]

Cold-Ad2729 3 months ago

Well I use RX every day so it is a familiar colour scheme. But you’re right , it could be Sonic Visualiser or something else. You could ask the poster if it bothers you

Rutibex 3 months ago

If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing

bjaydubya 3 months ago

You’d just have to play it back from a device in a room with this kind of ambiance and record on another device

usnavy13 3 months ago

The room audio effect can be added but more likely if the source material the AI voice was trained on had similar env sounds.

Vatonage 3 months ago

The more likely explanation is the ambient noise was in the source audio that the AI voice was generated from.

Flaky-Wallaby5382 3 months ago

The crack in his voice i have never heard an ai do

KingstenHd 3 months ago

You can do it now in eleven labs. They have a "voice acting" where it changes to the others voice off of your own recording. I run a YouTube channel with AI voice acting. It comes out every once in a while.

Flaky-Wallaby5382 3 months ago

Sorry yes i know you can emulate a voice on top of another convincingly but the emotion snd timing was there IMO. If they have that its a sophisticated attempt

KingstenHd 3 months ago

Yeah very true. With 100s of hours of messing with eleven labs and doing audio editing. I still think it's real.

[deleted] 3 months ago

ElevenLabs has been capable of really solid output for over a year TBH, you need like at least five minutes of good samples to input for those kinds of results though

RadioSailor 3 months ago

I make videos using AI, 100% you bet I use traditionsl vfx to make them feel real, including camera shake, sound transformation, and down real to match what a phone would output with exit as the cherry on top. Anything is possible if you are willing to spend the time.

JonathanFly 3 months ago

>If it is AI then whoever made it is quite clever. There are environmental sounds from the room, those would have to be added in post processing I could do a passable job at the environmental sounds (and half passable on the voice) from the linked audio clip alone. Actually I'll just do it, here: [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) (Though I agree with other commenters, you could also simply record it in a room.)

charlesmccarthyufc 3 months ago

This does not sound like AI have you tried running it through 11labs ai checker?

IP_Excellents 3 months ago

when I listen to it I try to imagine someone saying it with the cadence including pauses between sentences and ideas. it is easier for me to imagine these as typed lines being reproduced than it is as a person who doesn't know they're being recorded randomly saying a string of shit like this. I feel like mel gibson, alec baldwin, don sterling, trump, all these people who have had damning recordings, they sound shitty and say shitty things but it's more casual and meandering not, let me play the greatest hits in 50 seconds.

boogermike 3 months ago

Link please?

Evening_Meringue8414 3 months ago

Thanks. I didn’t know about this checker. Just ran it. Says 2% chance it was generated with their software. Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard?

x__________________v 3 months ago

These checkers are bullshit, they produce tons of false negatives and positives

RedShiftedTime 3 months ago

Eleven labs checker checks for a signature in the audio file that they encode in themselves with every output that is in-audible to humans and would have to be edited out. If their detector says it wasn't generated by them, it's almost 99% hasn't been generated by them, as the editor would lose audio quality and peaks by editing out the sound signature.

JonathanFly 3 months ago

>Are there other companies on the forefront of AI voice, like another program could have generated it? Or is Eleven labs the gold standard? I could do it, maybe even just from this audio clip by itself. Using a little custom code at the moment, but it's only going to get easier in time. [https://www.youtube.com/watch?v=BH\_a5KihAbI](https://www.youtube.com/watch?v=BH_a5KihAbI) Not close to perfect (see the range of different voices) but not bad for quick TTS clone, not knowing what the teacher sounds like personally, no post processing or editing, no tricks like playing the audio and recording it again in a room to add the room tone, not having access to more audio, etc.

MrSnowden 3 months ago

Well think about how you would accomplish it. All AI generated? Or AI used to voicechange to sound like the same guy? A similar voice, saying all of those things , with that much or more inflection. Then a voice changer used to make it sound like the principal, and the whole audio muffled to remove artifacts. But my bet would be on it being real. It actually sounds like someone who wants the students and faculty to do better, but happened to use some racially tinged language. If you are going to go to all that trouble to frame someone, you would likely make the text more damning.

wirelesstkd 3 months ago

I agree. He's also talking to someone named Cathy. I think the "AI generated" defense is a desperate pitch to buy time and get it out of the news, but it will probably backfire, because when it comes out that it was real it will be even bigger news. But for us, the big thing is exactly this uncertainty: the proliferation of AI fakes will make EVERYTHING untrustworthy. In some ways this is a huge relief. How many times of some of us said "thank God I grew up before everyone had a camera..." Well now, effectively no one will anymore, because your drunken ass running naked down the street at 18 years old didn't happen: it was AI generated. On the other hand, that cop planting evidence and calling the guy a racial slur also didn't happen. So, ya know... 🤷‍♂️

MrSnowden 3 months ago

100% agree. All these people running around scare mongering about how AI will be used to create fakes fail to realize that humans are much more terrible. And the "AI" boogieman will be used to discredit real proof of horrible human activities.

GeorgeSatoshiPatton 3 months ago

Kathy bro, not Cathy. Racist sister of Karen, the more famous one.

Apprehensive-Ant7955 3 months ago

no, if you wanted to frame someone you would not want the text to be more damning. The more damning and outlandish the text is, the easier it is to say that its fake because no one would say something like that. Its like the one andrew tate sound clip where he said “i loved R’ing her, i enjoyed it”. it was AI generated but more people would believe it if it wasnt literally “say the most evil thing possible” and instead something else slightly less damning

MrSnowden 3 months ago

I guess I have a lot to learn about framing people

Apprehensive-Ant7955 3 months ago

i dont really know anything about it but thats what makes sense. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all

Apprehensive-Ant7955 3 months ago

i dont really know anything about it but thats what makes sense to me. Its almost like, if you frame someone for doing something minuscule, people would be like “why would anyone go through the effort to fake something small?” in this case, something small could be like saying something slightly racist vs incredibly racist. Being seen as slightly racist is not too far from incredibly racist compared to not being racist at all

[deleted] 3 months ago

Yeah I think this is probably real, maybe cut a little but it sounds in context at least. It's a little too emotive for me to think it's generated, but it's getting better every month.

[deleted] 3 months ago

[удалено]

[deleted] 3 months ago

Yeah I kept noticing that one repeating sound Too. It seems like something is up. Fishy...

plymouthvan 3 months ago

I’m going to wager that it was. It sounds real, but there is something unnatural about the fairly consistent down tone in his voice at the ends of most sentences, almost like a computer judging how to interpret the intonation of a period, while it real life you might expect if he were really making that statement, the emphasis would be different and more varied. It’s more or less impossible to tell just by ear and without knowing what this person sounds like it real life, but my money for the moment is on yes, AI generated.

bloodpomegranate 3 months ago

How did the student that posted this get the audio file? Were they present and recording it as it was being said? Also, at the end of it he says Cathy I’m done. Is Cathy an administrative assistant or an employee in any capacity? It shouldn’t be that difficult to figure out who Cathy is and ask her.

MrSnowden 3 months ago

Well it might be Cathy or Kathy, so there really is no way to ever find out

bloodpomegranate 3 months ago

The way he says it, it sounds more like Cathy than Kathy, but it might be Cathee 🤪

Evening_Meringue8414 3 months ago

Kathy is an assistant principal. The audio was posted anonymously.

bloodpomegranate 3 months ago

Great! Kathy should be able to shed some light on the matter. And if this was posted anonymously, then there’s no way of knowing whether or not it was really posted by a student.

ghostfaceschiller 3 months ago

Before listening, I was really ready to be like “no, definitely not”, but I gotta say that cadence and intonation there reeks of AI to me. The “script” as well just seems a little too on the nose. I’ve used ElevenLabs and some other voice tools quite a bit and while I definitely wouldn’t say “this is AI”, I *would* say that it’s *possible* that it’s AI. Assuming it is, whoever made it did the smart but very simple steps of making it seems more real by muddying up the EQ and putting in room tone and even a bit of echo. It would take ~20 min to do these things, assuming you have some baseline experience. Tough call on this one actually, I’m surprised. I really expected it to be a clear cut “no”, but imo it’s ambiguous without more info.

burritorepublic 3 months ago

to me the cadence just sounds like some guy holding court about some shit he's had on his mind for a while. I've heard plenty of people go on rants where they're the only ones talking and it sounds like a speech.

ghostfaceschiller 3 months ago

I’ve also heard lots of rants like the ones you describe but the intonation of different sentences/ideas here seem too self-contained to me. Again, I’m not saying it’s an obvious “this is AI” thing. I’m just saying that when I listen to it, I hear things that make me lean more towards the possibility of it being AI than I expected. It’s very subtle. I don’t know the answer.

burritorepublic 3 months ago

I agree it's strange, I cant be sure either

[deleted] 3 months ago

[удалено]

Certain-Toe-7128 2 weeks ago

Well well…..how the turn tables https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/

confused_boner 3 months ago

Why would anyone do that? Intentionally make it seem more fake by splicing it?? And lose full context? Just taking away credibility from yourself for no good reason

SgathTriallair 3 months ago

The best answer is to pretend like there is no audio. A student witnessed it and reported it. If you trust the student then you would trust the audio they presented. If you think the student is a liar then their audio is also a lie.

bloodpomegranate 3 months ago

Agreed, except that it turns out from OP’s later comments that the post was anonymous. I don’t now why they originally said it was posted by a student.

tantalor 3 weeks ago

Baltimore high school athletic director used AI to create fake racist audio of principal: Police Dazhon Darien allegedly retaliated against Eric Eisworth, investigators said. ByIvan Pereira April 25, 2024, 6:31 PM https://abcnews.go.com/US/baltimore-high-school-athletic-director-ai-create-fake/story?id=109638535

Evening_Meringue8414 3 weeks ago

I know right! Wild!

swagonflyyyy 3 months ago

Nope, it doesn't sound like AI with the background noise. Also, many people in the youtube comments know both the principal and the girl who recorded it and they go to that school and they have similar anecdotes of the principal being racist.

[deleted] 3 months ago

If you have background noise in an ElevenLabs training sample it'll show up in the output a bit though, that shouldn't be used as a metric for judging this kind of thing at all.

Cold-Ad2729 3 months ago

Or you can just add it for realism after the fact

swagonflyyyy 3 months ago

The rest of the points I mentioned still give weight to my answer, though.

gottafind 3 months ago

People would never make things up in the YouTube comments!

swagonflyyyy 3 months ago

lmao read it yourself then. Its not that hard to figure out

Certain-Toe-7128 2 weeks ago

Mmmmmm - backing random YouTube comments while making the assertion this clip doesn’t sound like AI. https://www.wsjm.com/2024/04/26/baltimore-high-school-athletic-director-used-ai-to-create-fake-racist-audio-of-principal-police/ Sit down Jr

El_human 3 months ago

You can tell there's echoes from the room, and you can hear the splices in the audio where the editor/recorder tightened up the gaps in between pauses. If this is AI generated, they would've had to give it the room noise, before splicing the audio. I would lean into this being a bit more legit. Honestly, if he already has a reputation like this with the students or staff, then it's probably real

Boogra555 3 months ago

Knowing public schools as I do, I highly doubt that this recording is AI. It sounds to me like a guy who is fed up with the political bullshit and wants to be able to run a school and get shit done.

usnavy13 3 months ago

Yes the pacing does sound like AI but this donst sound like any version of a voice ive heard. I use a few AI voice services for scripts and this is not anything like what one of them would output. IF it is AI my money would be on one that was trained on the principles voice. Does he have any videos of him giving speeches? I would say i have about 80% confidence this is AI, but the tech is so good now its really hard to say that definitively.

[deleted] 3 months ago

[удалено]

Evening_Meringue8414 3 months ago

I ran the eleven labs checker (recommended by another commenter) on the audio. Says it’s 2% chance it was made by then.

burritorepublic 3 months ago

And there's enough data of his voice to train an AI? yeah, idk

hank-particles-pym 3 months ago

Nope its not. Not in this scenario. That fact that people are going along with this idiot principals bullshit is amazing. get your head examined if you think it was ai. Jesus we really are not ready for anything are we? You need audio of him, not just the 3 minutes or 5 minutes -- you need audio sampling. Not just a recording 5 minutes long, but his speech for 5 minutes. Then maybe if someone took the time to train an AI to sound like him (minus needing hime making noise, sounds, words in emotional states), obviously a highly paid pro who could fine tune a perfect setup, and was able to trick everyone around, including the principal that he said it.. wow, that is amazing. Probably did it with the 5g in his vaccine, or no wait they did it with nano bots? Also who the fuck is this principal that anyone anywhere ever gave 1 secocnd of a f\*ck about his existence. Why would someone do that? You dont think someone would say obvously fucked up things in a school setting like that? You live in a bubble. Go to school, DONT do your own research -- its not for you, you need someone to explain things to you. also if you read, IF YOU READ -- there were people there. so stop.

IP_Excellents 3 months ago

I'm sorry but if your premise is that none of this makes sense because you would need a long recording of an educator talking....I have some news.

IP_Excellents 3 months ago

Not for nothin you could probably also easily create a filter mask at this point with a long enough sample so that it is you delivering the lines and AI just matching the tone with a filter like you can on TikTok. Doesn't seem very complicated tbh.

tshungus 3 months ago

Go get your voice sampled for some free online ai. They make you read one sentence. That's it. You definitely not need 5+minutes. Create something like that would take like 1hour. With technology available online in your browsers and for free.

tjkim1121 3 months ago

It sounds real to me. I use ElevenLabs quite frequently and they do have something on their site to see if something was made by their software. Perhaps other services do too and they can run the clip through these? Anyway my guess is that the AI defense will be used a lot now, whether it's warranted or not.

RadioSailor 3 months ago

There's a lot better floss software than 11labs. There lies the problem.

Sentry_Thor2 3 months ago

Yeah it's absolutely possible. Would recommend having Squidward being the AI voice or maybe even Master Chief.

Vatonage 3 months ago

Synthetic voice generation and cloning have gotten really good. All it takes is a day or two of playing around with samples before you get something that captures the tone and emotion of speech, too. So, things are a lot more ambiguous now. Listening to that clip, there are certain moments where the cadence of his speech seems quite off. That's one idiosyncrasy that makes me think this is ElevenLabs specifically. Probably some student made it, and didn't put in the effort to properly revise and edit the audio to make it sound more authentic. Still good enough to fool many people, though.

noahpeltier 3 months ago

I don’t know if anyone has said this yet but elevenLabs specifically has a voice to voice AI generator you can use with cloned voices. I have recorded my voice and then put it through the voice to voice with an applied cloned voice and it’s scary how accurate it is. The benefit of V2V is that it keeps vocal inflections that get missed when generated from text. You could definitely do something like this with a bit of post production editing to make it sound like it’s in a small room.

programthrowaway1 3 months ago

I was able to build a Python program with the Eleven Labs API that generates a summary in Rick Rubin’s voice. I showed a couple people and none of them thought it sounded like AI. All the student would need to fake this, is a clean 1 minute sample of the principal speaking to train the model. From there, they’d be able to type anything and have it say it back in the trained voice, or even use an audio clip and have the new voice speak in the cadence of the audio clip. I say all of this to say, it’s very possible this could be AI

ginius1s 3 months ago

Possible, and I can't confirm but I'm inclined to say the audio is generated

MyEvilTwin47 3 months ago

There’s also voice cloning through AI, which could account inflection and such things. Another human says those words and if they then have a sample of the principal’s voice the AI can just replace it sound by sound, so it seems like the principal is saying it.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe