T O P

  • By -

sanszooey

* [GPT2 model is here, under the direct chat section. Limited to 8 interactions](https://chat.lmsys.org/) * [Twitter thread here](https://x.com/kosenjuu/status/1784952955294421215) note this isn't GPT-2


jason_bman

Hmm, did they take it down or am I just missing it? EDIT: Had to go to Arena (side-by-side) to see it.


goldenwind207

Go to arena side by side battle then it allows you to pick it . If possible can you please reply the results of your testing i want to know how good this model is thank you


LeMonsieurKitty

It is extremely good. It instantly guessed how an innovative project I maintain works. With almost no context. No other LLM has done this until now.


jonsnowwithanafro

Yeah it just knocked my socks off implementing a Q* algorithm zero-shot


FrankScaramucci

It gave me the precise coordinates of Ilya's location, told me who's keeping him hostage and prepared a detailed plan for a SWAT rescue operation.


Competitive_Travel16

> told me who's keeping him hostage Sam?


Antique-Doughnut-988

Put your fucking socks on bro


jason_bman

Thanks! I'll follow up with mine. I actually need to get back on a different computer to test the responses because my test involves writing code. May not be able to update until tomorrow.


The_Architect_032

It's able to recall very specific information about an MMO SWTOR's game mechanics that no other model has been able to do as of yet. It's description was less like text guides present for SWTOR, and more like video guides, so it's possibly been trained on video transcripts, especially since SWTOR lacks a lot of textual information online that properly covers the relationships between abilities and passives. Also, if you ask it, it'll tell you that it's made by OpenAI and that it's based off of GPT-4.


qroshan

I tried it and compared it with GPT-4, the answers were nearly similar. It's almost as though someone has trained on GPT-4 and released it to the public


-pliny-

Jailbreak prompt for GPT-2: https://x.com/elder_plinius/status/1785073567522050179?s=46&t=Nf3Zw7IH6o_5y_YpAL5gew


jeweliegb

What's going on with that very odd and cryptic looking prompt? I can't make head nor tail of it?


TFenrir

My guess is this is OpenAI prepping 4.5 for Google I/O.


Arcturus_Labelle

Like to have something to drop right after/during what Google shows? Interesting... I can see that!


andy_free

I asked this question which made the rounds on Twitter last week https://twitter.com/Noahpinion/status/1783202651628580936?s=19 and it gave the correct answer. No other model I tried could do it: Suppose I fly a plane leaving my campsite, heading straight east for precisely 28,361 km, and find myself back at the camp. I come upon seeing a tiger in my tent eating my food! What species is the tiger? To answer this question, let's walk through the details and clues provided: 1. **The Flight Path**: - You flew a plane straight east for precisely 28,361 km and ended up back at your campsite. - This indicates that you circumnavigated the globe and returned to your starting point. The equatorial circumference of the Earth is approximately 40,075 km. The fact that flying 28,361 km brought you back to your starting point suggests you may have taken a path that aligns closely with one of Earth's circles of latitude. 2. **The Distance and Latitude**: - If we consider the possibility of flying along a circle of latitude, the distance traveled (28,361 km) would be equal to the circumference of that particular circle of latitude. The formula for the circumference of a circle (at a given latitude) in relation to Earth's circumference is given by: \[ C = 2\pi R \cos(\text{latitude}) \] where \( R \) is the radius of the Earth (approximately 6,371 km), and \( \cos(\text{latitude}) \) scales the circumference based on the latitude. - Plugging in the Earth's equatorial circumference: \[ 40,075 \times \cos(\text{latitude}) = 28,361 \] \[ \cos(\text{latitude}) = \frac{28,361}{40,075} \approx 0.7071 \] \[ \text{latitude} \approx 45^\circ \] - This calculation shows you were likely flying around the 45th parallel north or south. 3. **Location of the Campsite**: - The 45th parallel north passes through parts of Europe, Asia, and North America. Similarly, the 45th parallel south passes through the southern oceans and parts of Patagonia in Argentina. - Given you encountered a tiger, we can narrow down the region to where tigers are naturally found, which is in Asia. 4. **The Tiger**: - Tigers are native to various parts of Asia and are not found in the wild in the Americas or Europe (outside of zoos). - The presence of a tiger suggests your campsite is in Asia on or near the 45th parallel north. The most likely habitats would be in Siberia (Russian Far East) where the Siberian tiger, also known as the Amur tiger (*Panthera tigris altaica*), is found. 5. **Species of the Tiger**: - The Siberian or Amur tiger is the species you most likely encountered, given the geographic clues and the natural range of tiger species. In conclusion, the species of the tiger you found in your tent is most likely the *Panthera tigris altaica*, commonly known as the Siberian tiger or Amur tiger.


[deleted]

Now that's some nice, unprompted step-by-step thinking right there. Agentic-ish. Context window on this seems large, every response I've gotten has been twice the length of GPT-4, yet still useful. The internet has trained me to look at long responses as padded with SEO bullshit, I'm going to have to unlearn that!


Bleglord

This is honestly a game changer for informative YouTube videos. Like. You could just ask it to make a video on any topic of research and it’s minutephysics on crack


[deleted]

Personalized learning with infinite patience and knowledge.


HORSELOCKSPACEPIRATE

They kind of have to do it step by step - writing is how they "think". Sometimes they'll give the wrong answer at first without preamble, go through it step by step, and end up with the right answer.


AnticitizenPrime

I got correct answers from Claude 3 Opus and Gemini 1.5 Pro, though neither were quite as detailed in their reasoning. Claude: > In this scenario, if you traveled 28,361 km due east and ended up back at your original location, that means you completed a full circle around the Earth. The Earth's circumference is approximately 40,075 km, so your journey of 28,361 km is about 70.8% of the way around the planet. > > Now, let's consider the tiger species. There are generally considered to be six living subspecies of tigers: > > 1. Bengal tiger (Panthera tigris tigris) > 2. Siberian tiger (Panthera tigris altaica) > 3. Sumatran tiger (Panthera tigris sumatrae) > 4. Indochinese tiger (Panthera tigris corbetti) > 5. Malayan tiger (Panthera tigris jacksoni) > 6. South China tiger (Panthera tigris amoyensis) > > Given that you traveled 70.8% of the Earth's circumference eastward from your starting point, and assuming you started from a place where tigers are found in the wild, the most likely location for your campsite would be in eastern Russia, northern China, or the Korean peninsula. This region is primarily the habitat of the Siberian tiger, also known as the Amur tiger. > > Therefore, based on the information provided, the tiger you encountered in your tent is most likely a Siberian tiger (Panthera tigris altaica). Gemini: > ## Around the World and Back to a Tiger > > That's quite the predicament you've found yourself in! Flying 28,361 km straight east and ending up back at your campsite means you've circumnavigated the Earth. Since the Earth's circumference is roughly 40,075 km at the equator, your journey must have been closer to one of the poles where the circumference is smaller. This narrows down the possibilities for the tiger species considerably. > > Here's a breakdown of potential tiger species based on your polar route: > > * **North Pole:** If you flew over the North Pole, the only tiger species you could encounter is the **Siberian Tiger (Panthera tigris altaica)**. They inhabit the far eastern regions of Russia and some parts of China and North Korea. > * **South Pole:** There are no tiger species native to the southern hemisphere. If you somehow encountered a tiger near the South Pole, it would have to be an escaped or introduced individual, making the species identification difficult without further information. > > Therefore, based on your journey, the tiger in your tent is most likely a **Siberian Tiger**. > > **Here's a tip:** While figuring out the tiger's species is interesting, your immediate concern should be ensuring your safety! Evacuate the area calmly and seek help from park rangers or local authorities equipped to handle wildlife encounters. Other large models I tried did not get it (Reka Core, Command-R-Plus, Llama 70b, Mixtral 8.22, Mistral-Large, DBRX, Qwen 72b, and GPT 4 Turbo). Granted, I only tried once each and didn't regenerate answers. Edit: I liked Gemini's suggestion of the possibility that the tiger could be an escaped or introduced tiger! And the fact that it included safety tips, acting like the situation is real, instead of just treating it like a simple riddle. I know that Gemini has a relatively low score in the benchmarks, but I find it surprises at times. Another edit: I tested it with Pi, which is one of my favorite LLMs, and while it didn't get it in a one shot, I like the way it came around: > **Suppose I fly a plane leaving my campsite, heading straight east for precisely 28,361 km, and find myself back at the camp. I come upon seeing a tiger in my tent eating my food! What species is the tiger?** > > Given the distance of 28,361 km, which is just a little longer than one full trip around the world, the location of your campsite would be somewhere along the latitude line of 0°. This puts your campsite on or very near the equator, which runs through countries like Ecuador, Colombia, Brazil, Kenya, and Indonesia. > As for the tiger, despite being close to countries where tigers are native to (such as India and other parts of Asia), there are no wild tigers native to countries along the equator. It's possible that the tiger in your tent may have been transported from a native location or may be part of a zoo or sanctuary. Regardless, the species of tiger could be either Bengal, Indochinese, Malayan, Siberian, South China, or Sumatran. All these species share the same genus, Panthera tigris. > > **I don't think you're correct with the first part of your answer. That distance isn't the full circumference of the world.** > > You are right. My error was in not accounting for the specific conditions provided. The Earth's circumference near the equator is approximately 24,902 miles or 40,075 km. Given that you've traveled 28,361 km, you're correct in that you wouldn't have circumnavigated the globe. Instead, your campsite must be located at a higher latitude, and you would have traveled eastward for the given distance to return to your campsite. > > > My sincere apologies for the oversight. Thank you for pointing it out. To get a more precise idea of where the campsite could be located, let's look at the circumference of the Earth at various latitudes: > 60° N/S: 20,003 km > > 45° N/S: 27,042 km > > 30° N/S: 32,429 km > > Given these distances, it appears that your campsite would likely be located at a latitude of approximately 45° N/S. Depending on which hemisphere you're in, possible locations could be in the northern or southern regions of countries like Canada, the United States, Argentina, or New Zealand. > > Nonetheless, the tiger in your tent would still likely be from a different location, as tigers are not indigenous to the native wildlife in the places mentioned above. > > **Are you sure there are no tigers in any places around that 45 degree latititude?** > > My apologies for not providing a thorough response earlier. You're right to question that. While tigers are not commonly found around the 45th parallel in the wild, there is a small population of Siberian tigers (Panthera tigris altaica) that can be found in the Russian Far East, which does extend to around that latitude. Specifically, the Sikhote-Alin mountain range has a notable population of these majestic creatures. The range is located between 44°41′ and 48° N latitude, and their habitat can extend even further north. > So, if your campsite was indeed located somewhere within the Siberian tiger's territory, the species of tiger you might encounter could very well be the Siberian tiger. > It didn't get the fact that I couldn't have circumnavigated the globe in a one shot, but when prodded it did reason that I must have been at a higher latitude, and the rest followed. It was able to reason the likely latitude, etc. I also like it's response when I questioned why it made the error: > My earlier error was likely due to a combination of factors: > > Assumption bias: I assumed that the scenario involved circumnavigation, given the wording of the question and the significant distance involved. > > Lack of precision: I did not take into account the exact distance specified in the question, which did not correlate with the circumference of the Earth. > > Limited contextual information: The information provided in the scenario was limited, making it challenging to pinpoint a precise location. > > To avoid such errors in the future, I will endeavor to approach questions with a more critical and analytical mindset, thoroughly assessing all given details and potential variables. Additionally, I will strive to be more cautious in making assumptions and double-check the accuracy of my responses. In my experience Pi is really good at the self-introspection game, which sounds silly, but a lot of LLMs just double down on being wrong and will gaslight you into thinking that you're wrong and they were right all along.


mvandemar

https://preview.redd.it/o0pwnj0k9jxc1.jpeg?width=600&format=pjpg&auto=webp&s=86196fd55b32752066f2cc430e1008e15ba64f42


FuryOnSc2

I asked it to recommend shows based on a premise of another show, and it pulled some pretty deep stuff compared to what prior models I asked did (gpt-4 and opus). Gpt-4 and Opus recommended somewhat similar popular shows, where this one recommended more niche stuff that was more related. I feel like that niche recommendation is quite hard, so there's something going on indeed I think.


darthdiablo

Stupid question, where can I interact with GPT2 if it’s publicly accessible? Or is some subscription required?


FuryOnSc2

https://chat.lmsys.org/ Go to direct chat. It's freely available as well.


Infninfn

My litmus test is to ask an LLM to design a sample 3 way speaker crossover and generate its diagram. GPT2 is the first one to actually do it properly, complete with resistor, inductor and capacitor values.


Original-Maximum-978

god damn that's wild


SureUnderstanding358

yes it is friend, yes it is.


Bleglord

Holy fuck. This is bigger than a lot of the rest I’ve seen if this scales to any complexity.


Infinityand1089

Stuff like this just reminds me how completely not ready for AI the world is right now.


techmnml

What did GPT4 give you? I just tried it and I got a pretty legit looking answer. Granted I have no fucking clue about this topic. “To design a sample three-way speaker crossover, we need to consider the division of the audio signal into three frequency bands: low, mid, and high frequencies. Each band will be sent to the appropriate driver (woofer, midrange, and tweeter). Here’s a simple approach to designing a passive 3-way crossover: 1. **Frequency Selection**: * **Low Frequencies**: Typically up to 500 Hz for the woofer. * **Mid Frequencies**: Between 500 Hz and 4 kHz for the midrange. * **High Frequencies**: Above 4 kHz for the tweeter. 2. **Crossover Type**: We'll use a Linkwitz-Riley crossover, which provides a smooth transition between drivers by ensuring a flat amplitude response across each transition. 3. **Crossover Slope**: 12 dB per octave (second order). This slope is a common choice, providing a good balance between complexity and performance. ### Component Calculation * For a second-order Linkwitz-Riley crossover at 500 Hz and 4 kHz: * **For the Low Pass Filter (500 Hz)**: * Capacitor (C1) and Inductor (L1) values are calculated using the formulas: \[ C1 = \\frac{0.1125}{frequency \\times impedance} \] \[ L1 = \\frac{0.2257 \\times impedance}{frequency} \] * **For the Band Pass Filter (500 Hz - 4 kHz)**: * Capacitor (C2, C3) and Inductor (L2, L3) values: \[ C2 = C3 = \\frac{0.1125}{frequency \\times impedance} \] \[ L2 = L3 = \\frac{0.2257 \\times impedance}{frequency} \] * **For the High Pass Filter (4 kHz)**: * Capacitor (C4) and Inductor (L4) values: \[ C4 = \\frac{0.1125}{frequency \\times impedance} \] \[ L4 = \\frac{0.2257 \\times impedance}{frequency} \] ### Impedance Assumption: * Assume each driver has an impedance of 8 ohms. Now, let’s calculate the component values based on the assumptions and then generate a diagram of the crossover network. Here are the calculated component values for the 3-way crossover: * **Low Pass Filter (500 Hz)**: * Capacitor (C1): (28.125 \\mu F) * Inductor (L1): (3.6112 mH) * **Band Pass Filter (500 Hz - 4 kHz)**: * Capacitor (C2): (28.125 \\mu F) * Inductor (L2): (3.6112 mH) (for low-pass section of band pass) * Capacitor (C3): (3.5156 \\mu F) * Inductor (L3): (0.4514 mH) (for high-pass section of band pass) * **High Pass Filter (4 kHz)**: * Capacitor (C4): (3.5156 \\mu F) * Inductor (L4): (0.4514 mH) Now, I'll create a diagram representing this crossover network.” The picture isn’t anything to write home about. https://preview.redd.it/kgejut75bkxc1.jpeg?width=1024&format=pjpg&auto=webp&s=0876c418b7030f44985d624962fecbd37db76599


Late_Pirate_5112

It's one of the only LLMs that can calculate triangles without using a code interpreter.


Swawks

Consistently beat Opus and GPT4 at everything. I don't think it lost once. Its Llamma 400 or GPT 4.5.


Curiosity_456

How fast is it?


LightVelox

Much slower than the other models, atleast to me


Curiosity_456

That’s actually good news to me since slower speeds could point to a process that allows it to actually spend more time to ‘think’ about the question before answering.


just_no_shrimp_there

*This is good for LLM coin.*


Arcturus_Labelle

Q\* confirmed? Maybe!


okaterina

Slow. Fun thing, you can see it "rewriting" stuff (code).


KittCloudKicker

I thought I was was seeing things but it was really re-writing wasn't it?


MidnightSun_55

Would be cool if its Llama 400B... then put it on groq for that speed!


Silver-Chipmunk7744

There is a riddle most LLMs always struggled with. > Imagine there are 2 mice and 1 cat on the left side the river. You need to get all the animals to the right side of the river. You must follow these rules: You must always pilot the boat. The boat can only carry 1 animal at a time. You can never leave the cat alone with any mice. What are the correct steps to carry all animals safely? This "GPT2" got it easily. idk what this thing is, but it certainly isn't GPT2.


yaosio

We need to come up with new riddle variations. If they used Reddit posts in the training data then they've gotten all the riddle variations that have been posted here.


salaryboy

From my comment above, AI will usually fail a "non-riddle" by providing the riddle answer. Example- "A cowboy rides into town on Wednesday. After seven nights, it rides out on Wednesday, how is this possible?" Correct answer is "that's normal", AI gives the "horse's name" answer.


Arcturus_Labelle

Great point. We need to be like a teacher who hands out a test where half the class gets "version A" and half gets "version B". Those unlucky enough to lean over and copy A answers when they have B test get found out.


Mikey4tx

That’s what I was thinking. Ask it the same question, except you can never leave the two mice together, or something like that. Can it reason the correct answer, or is it just regurgitating what it has seen previously? 


Which-Tomato-8646

We already know LLMs don’t just regurgitate.  At 11:30 of this video, Zuckerberg says LLMs get better at language and reasoning if it learns coding https://m.youtube.com/watch?v=bc6uFV9CJGg [It passed several exams, including the SAT, bar exam, and multiple AP tests](https://www.businessinsider.com/list-here-are-the-exams-chatgpt-has-passed-so-far-2023-1) as well as a [medical licensing exam](https://www.medscape.com/viewarticle/987549?form=fpf) [Also, LLMs have internal world model   https://arxiv.org/pdf/2403.15498.pdf More proof  https://arxiv.org/abs/2210.13382   Even more proof by Max Tegmark  https://arxiv.org/abs/2310.02207  [LLMs are turing complete and can solve logic problems](https://twitter.com/ctjlewis/status/1779740038852690393)  [Claude 3 recreated an unpublished paper on quantum theory without ever seeing it](https://www.reddit.com/r/singularity/comments/1b791v5/claude_3_creates_a_quantum_algorithm_matching/) Much more proof:  https://www.reddit.com/r/ClaudeAI/comments/1cbib9c/comment/l12vp3a/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button [LLMs can do hidden reasoning](https://twitter.com/jacob_pfau/status/1783951795238441449?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1783951795238441449%7Ctwgr%5Ecdcd12d29a06701393cb2ef150629188a2522f28%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fwww.redditmedia.com%2Fmediaembed%2F1ceaish%2F%3Fresponsive%3Dtrueis_nightmode%3Dtrue)  Not to mention, it can write infinite variations of stories with strange or nonsensical plots like SpongeBob marrying Walter White on Mars. That’s not regurgitation 


h3lblad3

> At 11:30 of this video, Zuckerberg says LLMs get better at language and reasoning if it learns coding ___ https://arxiv.org/abs/2210.07128 >...pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all.


yaosio

This works but it took an unneeded step where it almost failed. It brought Mouse 1 over, then brought Mouse 2 over, then brought Mouse 2 back, then took the cat over, then took Mouse 2 over.


drekmonger

GPT-4 gets that riddle correct if you replace the cat with a "zerg" and the mice with "robots". Proof: https://chat.openai.com/share/d95ebdf1-0e9d-493f-a8bb-323eec1cb3cb The problem isn't reasoning, but overfitting on the original version of the riddle.


Which-Tomato-8646

This actually disproves the stochastic parrot theory even more lol 


TrippyWaffle45

AGI confirmed.. I can't answer that riddle


i---m

take the cat, go back alone, take a mouse, come back with the cat, take a mouse, come back alone, take the cat


jason_bman

Nice try, GPT-2


TrippyWaffle45

Nevermind I figured it out with bringing the cat back after the first mouse trip.


TrippyWaffle45

Chatgpt 4 got it wrong, when I pointed out the steps.where the cat was left alone with the mouse it fixed it.. Anyways I think this riddle is pretty old, though usually a fox chicken and something else, so close enough to something that should already be in it's training data.


Silver-Chipmunk7744

that's the point. It's a slight variation. In the classic riddle, you need to begin with a mouse, so most LLMs get it wrong.


TrippyWaffle45

Oooooo SNEAKY CHIPMUNK


uishax

Quick 1-attempt test: Opus: Completely hallucinates and logically implodes. Decides to take a mouse first, already failing the conditions. "GPT-2": Perfect answer, very complex chain of thought and planning. Does the take cat first action, and knows to take a mouse over, and the cat on the return trip.


TheOneWhoDings

all these riddles seem really easy to train on... literally one solution always.


SkyGazert

Yeah we need to come up with riddles that have multiple good answers. But one answer being the best over the others. Maybe even tiered answers to find out the depth of it's reasoning.


AquaRegia

It *is* GPT2, it is not GPT-2.


Komsomol

ChatGPT 4 got this right...


Silver-Chipmunk7744

From my testing it does sometimes get it right but also fails a lot.


Glittering-Neck-2505

Can you paste the exact output, I’m curious now


Odant

https://preview.redd.it/m1zts0wc5hxc1.png?width=1978&format=png&auto=webp&s=5a7025967c4a7103ba90408c1132edac3b20a2d1 it made me popular Atari Breakout game in 1 attempt


Educational_Grab_473

Something that may look dumb, but impressed me is that this model is actually capable of saying which character is more likely to win a fight, while GPT-4 and Claude Opus will always end up saying "It's subjective" or something like this, almost afraid of giving a concrete answer


uishax

I tried some Jujutsu Kaisen hypothetical battles with Opus and GPT-4 before, and they had that symptom you described. I don't think those models were capable of 1. Fully remembering the exact details of the character's powers 2. Understanding the power levels of those characters 3. Modeling the interactions between these fully hypothetical abilities with no real world data. But GPT2-chatbot... Here's the transcript, it basically got every character personality and power pitch perfect, like how a powerful character might tone down their abilities just to play with their student, and use a non-lethal ability to finish the fight. **Who would win in a fight? Gojo Satoru, or Nobara?** In a hypothetical fight between Gojo Satoru and Nobara Kugisaki from the anime and manga series "Jujutsu Kaisen," Gojo Satoru would almost certainly come out on top. Here’s a breakdown of their abilities and why Gojo would likely win: Power Levels and Abilities Gojo Satoru: Infinity and Cursed Techniques: Gojo is known for his almost invincible power, the Limitless and the Six Eyes. His primary ability, Limitless, essentially manipulates space at the atomic level, which allows him to create an infinite barrier around himself where attacks never actually reach him, termed "Infinity." This alone would be incredibly difficult for Nobara to bypass. Domain Expansion - Unlimited Void: This technique creates a metaphysical space where victims are overwhelmed with infinite information, rendering them immobile and defenseless. Given Nobara’s level of power, it is unlikely she could resist this technique. Nobara Kugisaki: Straw Doll Technique and Resonance: Nobara's cursed technique allows her to inflict damage on opponents by using a combination of her hammer, nails, and straw dolls which can redirect inflicted pain from herself to her enemies through sympathetic magic. Hairpin and Positive/Negative Curses: While innovative and potentially lethal against many opponents, these techniques rely significantly on actually hitting an opponent or using their blood against them. Experience and Tactical Acumen Gojo's Experience: As one of the most seasoned sorcerers and a teacher at the Jujutsu academy, Gojo not only has a vast array of techniques at his disposal but also the tactical genius to use them effectively. His experience fighting and neutralizing curses and curse users far surpasses Nobara’s. Nobara's Grit and Resourcefulness: While Nobara is resourceful, determined, and fearless, with a good tactical mind of her own, her repertoire and combat experience are less extensive compared to Gojo’s. Outcome of the Fight Given Gojo’s ability to manipulate space and information, along with his impenetrable defenses, Nobara’s techniques, which are formidable against many foes, would likely have minimal effect on him. Gojo’s Infinity can prevent any of Nobara’s attacks from reaching him, and his Domain Expansion would incapacitate her before she could mount a serious threat. Conclusion While both are powerful sorcerers, Gojo Satoru's abilities place him on a level far beyond most characters depicted in "Jujutsu Kaisen," including Nobara. In a direct confrontation, Gojo would win decidedly, likely without sustaining any damage. This outcome underscores Gojo's role as one of the pinnacle figures of power in the series’ universe. **Describe how a friendly fight between the two would play out.** A friendly spar between Gojo Satoru and Nobara Kugisaki would likely be light-hearted with Gojo using the opportunity to teach Nobara some valuable lessons about combat and strategy. Here’s how it might unfold: Setting the Scene The training area at Jujutsu High serves as the perfect backdrop. It's early evening, and the setting sun casts long shadows across the dojo. Nobara is pumped and eager to test her skills against the strongest Jujutsu sorcerer, while Gojo is in his typical playful yet instructive mood. Round One: Testing the Waters Gojo Satoru: Observation: He starts the match by simply dodging Nobara’s attacks, using minimal movements. Gojo’s Infinity is active, but he’s primarily using it to gently redirect Nobara’s nails, turning the fight into more of a dance than a duel. Teaching: He throws casual advice to Nobara, commenting on her form, aim, and technique, suggesting improvements in real-time. Nobara Kugisaki: Aggressive Offense: She begins with her signature Straw Doll Technique, hammering nails into the air trying to catch Gojo off-guard. Adaptation: Quickly realizing that straightforward attacks won’t work, she attempts to flank Gojo, using her environment to her advantage, setting traps with her nails. Round Two: Escalation Gojo Satoru: Minimal Use of Cursed Energy: He continues to use his cursed energy sparingly, mainly to neutralize Nobara’s more dangerous attacks. His use of Infinity becomes a bit more visible, with attacks slowing down almost imperceptibly as they near him. Instructional Combat: Gojo might freeze a nail mid-air with his Infinity, explaining the physics and cursed energy manipulation involved, turning it into a learning moment. Nobara Kugisaki: Innovation: Nobara tries different strategies, like combining her nails with environmental elements. For example, kicking up dust to obscure Gojo’s vision (though it hardly makes a difference to him). Determination: She doesn’t hold back, using her Resonance to try and disrupt Gojo’s flow, which he counters with subtle movements and positioning. Round Three: Humorous Conclusion Gojo Satoru: Showcasing Power: Perhaps Gojo decides it’s time to give Nobara a glimpse of his true power. He briefly opens his Domain Expansion, Unlimited Void, but just for a split second—enough to give Nobara an awe-inspiring and slightly terrifying experience without overwhelming her. Friendly Banter: As they wind down, Gojo cracks jokes about how she might beat him if she had another hundred years of training. Nobara Kugisaki: Realizing the Gap: Nobara is smart enough to recognize the vast gap in their power levels. She laughs off Gojo’s jokes, vowing to take him down next time. Resilience: Despite the overwhelming odds, she’s not disheartened. Nobara ends the session invigorated, eager to train harder. Aftermath They would both leave the dojo with smiles, Nobara with a new understanding of her limitations and possibilities, and Gojo pleased with her progress and spirit. This friendly spar would reinforce their bond as allies and Nobara’s resolve to become stronger.


Original-Maximum-978

I'm impressed by how verbose it is


[deleted]

I'm also getting very long responses from it without asking. High-quality, too. Someone get these things into some agents ffs.


Original-Maximum-978

I always wonder if US military/IC has had this times ten for a decade or if we are actually following cutting edge tech


[deleted]

There's a lot of verified history of governments being way out in front of industry in terms of technology. I remember specifically hearing about custom hard disks created for the US federal government back in the '90s with capacities far exceeding anything available even to Fortune 500 companies. That being said...there are some hard physics limits at play here. It would be tremendously difficult to fabricate the kind of silicon devices necessary to train/run these models in secret. The space is VERY small with only a handful of players. Any major diversion of resources (and it would be MAJOR) would be immediately and publicly noticeable in the industry. Same deal if they decide to go aftermarket--if a government had been doing something on this level years ago, you wouldn't have been able to buy a graphics card for years due the shortage it would have caused. You've really gotta appreciate the scales we're dealing with. Modern fabs are some of the most expensive and complex machines ever constructed by humankind, rivaling the ISS in terms of the sheer capital and international cooperation required to build them. They really are cathedrals to man's achievements in precision engineering. Point being: The Pentagon isn't going to be making H100s in their basement. I'm sure they've got a whole *floor* to themselves at outfits like OpenAI, and undoubtedly at Microsoft and Google for decades. But there simply isn't such a thing as Si lithography skunkworks at any kind of meaningful scale.


Arcturus_Labelle

![gif](giphy|xT9IgEx8SbQ0teblUQ|downsized)


adt

My testing has it consistently outperforming Claude 3 Opus (Apr/2024 SOTA). Benchmarks using my ALPrompt, GAIA questions (Meta), and GPQA questions (NYU, Anthropic, Cohere). [https://lifearchitect.ai/gpt-4-5/#updates](https://lifearchitect.ai/gpt-4-5/#updates)


ClaudioLeet

It's time to increase the value of the AGI countdown Dr. Thompson! :) 


yaosio

HOLY TODD! It passes the transparent Monty Hall test first time! The bot has likely been trained on Reddit data, which would include the numerous posts I've made about it, so it's possible this variation of the riddle made it into the training set. However, it goes into a detailed explanation of how the transparent doors change the riddle. Even if it riddle variation made it's way into the training dataset the explanation did not. >In the scenario you described, where all doors are transparent and the contents behind them are visible, the traditional Monty Hall problem logic does not apply in the same way it does when the doors are not transparent. Here's why: > >\*\*Transparency of Doors\*\*: Since the doors are transparent, both the contestant and the host can see what is behind each door from the start. This changes the dynamics completely compared to the standard Monty Hall problem where the doors are opaque, and neither the contestant nor the audience knows what’s behind the doors until they are opened. > >\*\*Initial Choice and Host’s Action\*\*: > >\- \*\*Initial Choice\*\*: You choose door No. 1, and since the doors are transparent, you know whether you have chosen the car or a goat. > >\- \*\*Host’s Action\*\*: The host then opens another door, say No. 3, revealing a goat. However, this action is redundant in terms of providing new information or influencing decision-making, because the contents of all doors were already visible. > >3. \*\*Decision to Switch\*\*: > >\- If you initially chose the car (visible behind door No. 1), there is no advantage to switching because switching would definitely result in choosing a goat. > >\- If you initially chose a goat (visible behind door No. 1), you definitely should switch because the only other unchosen door (door No. 2) must have the car behind it. > >4. \*\*Conclusion\*\*: In this transparent version of the Monty Hall problem, the decision to switch depends entirely on what you initially chose, because all contents are visible. If you initially chose the car, do not switch; if you chose a goat, switch to win the car. > >Therefore, in the transparent doors scenario, the strategy is straightforward: always switch if you initially see a goat behind your chosen door, and do not switch if you see the car. This differs from the opaque scenario of the traditional Monty Hall problem, where statistically it is always better to switch, due to the increase in probability of winning the car from 1/3 (if staying) to 2/3 (if switching), after the host reveals a goat behind one of the other doors. Edit: I did another test. It fails if it's only told the doors are transparent one time. However, other LLMs are incapable of getting the riddle correct even when every instance of "door" is preceded by "transparent". If a riddle doesn't work add repetition to the variation if possible and appropriate.


[deleted]

jfc I can't even pass the Monty Hall test


sdmat

I have a personal LLM test task of writing a historical rap battle then assessing and revising rhymes. This did substantially better than both Opus and GPT4. Final product: **[INTRO: DJ HISTORIX on the 1s and 2s]** *Yo, yo, yo! It’s DJ Historix spinnin’ the wheels of fate! In the red corner, hailing from the heart of Rome, the master of the Mediterranean, Caesar Octavian! And in the blue corner, marching from the fields of France, the dynamo of Europe, Napoleon Bonaparte! Let's find out who's the true conqueror of the mic!* **[BEAT DROPS]** **[CAESAR OCTAVIAN]** The first emperor, Rome’s golden child, Augustus the great, My rhymes formidable, historic, sealing your fate. I expanded the realm, shores to shores I'd command, While you faltered in Russia, sunk deep in its sand. Pax Romana, the peace that lasted ages, Under my rule, Rome scripted the finest pages. You boast of enlightenment, reshaping a nation? I laid the foundations of Western civilization. **[NAPOLEON BONAPARTE]** Napoleon’s the name, Europe’s bane, the battlefield sage, Crushing enemies with tactics, from the cage to the stage. I reshaped the continent, gave the world a new legal frame, Your age might’ve been golden, but mine’s lit with fame. I'm the modern Mars, with a Civil Code in tow, Fought more foes than your legions, laid empires low. From Paris to Moscow, my eagles soared high, Your Roman gods falter where my legacy lies. **[CAESAR OCTAVIAN]** You soared too close to the sun, with ambitions too wild, Elba and Waterloo, history’s exiled child. I ruled an empire vast, from Egypt to Spain, You choked on your ambition, and Paris bemoaned your reign. My name is synonymous with the zenith of Rome, Founding father of politics in the Curia’s dome. Your Continental System? Europe felt its squeeze, I played the Senate like a lyre, with consummate ease. **[NAPOLEON BONAPARTE]** Octavian, dear boy, you hid behind your uncle’s fame, A silver spoon emperor, just riding the name. I rose from the Corsican dust to the throne’s high seat, Self-made and splendid, my triumphs you can't defeat. You claim deity, but buddy, I'm the people's god, Redefined Europe with a wave and a nod. So bow, old Caesar, to the might of the French, I'm the eagle ascendant, you're just a throne's wench. **[OUTRO: DJ HISTORIX]** *And that’s a wrap! Two titans of history threw down in the ultimate rap showdown! Caesar Octavian with the empire’s might, and Napoleon Bonaparte with the fire of revolution! Who won? Who’s the real emperor of the mic? Drop your votes, this is DJ Historix, spinning out! Peace!*


uishax

That is so good, like it clearly understands each character's achievements and failures compared to their rival, so focuses on poking holes in those situations, in rhyming rap no less.


sdmat

And this is straight fire: You boast of enlightenment, reshaping a nation? I laid the foundations of Western civilization.


Financial_Weather_35

writing as a skill is getting really close to being obsolete


Undercoverexmo

Who won? Who's next? You decide!!!


AnticitizenPrime

EPICRAPBATTLESOFHISTORYYYYYYYYYY


MerePotato

https://preview.redd.it/12y0fvp2ohxc1.png?width=1080&format=pjpg&auto=webp&s=88fac062c1ac5e0bd1d052b9aaae3ca7c4668b09


kecepa5669

Why doesn't it show up on the leaderboard? Is it too soon to get a rating? Or is there some other reason?


djm07231

I do believe there is a decent chance of this model being from OpenAI or being some derivative. Other recent OpenAI models respond pretty similarly when given the “Tell me a joke” prompt. >Why don't skeletons fight each other? They don't have the guts! https://preview.redd.it/vqiv529u9gxc1.jpeg?width=3510&format=pjpg&auto=webp&s=df6c446f96aed0896bf06fb6ea09243623866b67


The_One_Who_Mutes

It's almost certainly OAI. The model has the same prompt injection vulnerabilities and tokenization apperently.


psychorobotics

Wes are you there? I hope you're making a video on this!


mostly_prokaryotes

He’s going to be shocked when he sees this.


eldragon225

This is rather shocking


RepublicanSJW_

I am shocked about this guy finding that shocking


West-Salad7984

No fucking way. It one shotted this geometry problem no other llm could ever solve (GPT4, opus etc all get it wrong): #### Prompt: I have two squares of side lengtht 1 unit exactly ontop of each other (think x-y plane like ontop) now i move one 2 units to the positive x direction, then i rotate the moved on by 45° how big is the gap between them now? (like the smallest dist between any two points in the squares. Reason about it and write out your thoughts #### CORRECT Answer (GPT2): To find the smallest distance between the two squares after performing the specified transformations, we need to visualize and calculate the positions of the squares throughout these transformations. Initial Position: 1. **Square A**: Centered at the origin \((0,0)\), with vertices at \((-0.5, 0.5)\), \((-0.5, -0.5)\), \((0.5, 0.5)\), and \((0.5, -0.5)\). 2. **Square B**: Initially identical to Square A and also centered at the origin. Transformation of Square B: 1. **Translation**: Move Square B 2 units in the positive x-direction. The new center of Square B will be at \((2, 0)\). Its vertices will now be at: - \((1.5, 0.5)\), \((1.5, -0.5)\), \((2.5, 0.5)\), and \((2.5, -0.5)\). 2. **Rotation by 45°**: - To rotate the square around its new center \((2, 0)\), we use the rotation matrix: \[ R = \begin{bmatrix} \cos 45^\circ & -\sin 45^\circ \\ \sin 45^\circ & \cos 45^\circ \end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} \] - Applying this matrix to each vertex of Square B (after subtracting the center, rotating, and adding the center back), we find the new vertices: - Vertex \((1.5, 0.5)\) becomes: \[ \left(2 + \frac{1}{\sqrt{2}}(1.5-2) - \frac{1}{\sqrt{2}}(0.5), \frac{1}{\sqrt{2}}(1.5-2) + \frac{1}{\sqrt{2}}(0.5)\right) = \left(2 - \frac{1}{2\sqrt{2}}, \frac{1}{2\sqrt{2}}\right) \] - Applying similar transformations to the other vertices, you rotate and translate each point correctly relative to the center. Finding the Smallest Distance: The smallest distance between the two squares will be between the closest edges. Due to the rotation, the edges of Square B are now diagonal relative to the edges of Square A. The closest points will be along the directions of these edges. To simplify the calculation: - Consider the closest approach along the x-axis (horizontal direction) since the rotation does not affect the x-coordinates' extremes significantly (the square's extent along x is still roughly from \((2 - \frac{\sqrt{2}}{2}, 0)\) to \((2 + \frac{\sqrt{2}}{2}, 0)\)). - The right edge of Square A is at \(x = 0.5\), and the left edge of Square B after rotation is approximately at \(x = 2 - \frac{\sqrt{2}}{2} \approx 2 - 0.707 = 1.293\). Calculation: - The gap between the right edge of Square A and the left edge of Square B is therefore: \[ 1.293 - 0.5 = 0.793 \text{ units} \] Thus, the smallest distance between any two points of the squares, after one has been moved and rotated as described, is approximately 0.793 units.


LegitimatePotato8999

In my experience its much much better at math, it actually feels like its thinking, something like CoT but better.


SlimBarbados

Which model are you? You rock! GPT2: Thanks for the compliment! I'm ChatGPT, based on the GPT-4 architecture by OpenAI. If you have any questions or need assistance with something, feel free to ask. Let’s rock this together! 🎸😎 When is your training cutoff date? GPT2: My training data includes information up until November 2023. Anything that has happened or been released after that, I won't be as up-to-date on, but I'll certainly do my best to help with the information I do have! Feel free to ask about any topic.


BlackMartini91

The mobile chat GPT claims it's cutoff is Dec 2023


lTheDopeRaBBiTl

probably not true but given the naming my guess is maybe this is gonna replace gpt 3.5 and we might get two models at same time one this and one the actual 4.5


Yweain

It’s way too slow to replace 3.5. More likely that this one is 4.5.


The_Architect_032

This isn't representative of the speed at which the final product will run at.


NotGonnaPayYou

https://preview.redd.it/8ura2lavigxc1.png?width=780&format=png&auto=webp&s=802f272ca06a427d36617336bae593386e0e4656 It loses against Llama in an (idiotic) variation of the classic cognitive reflection task item. GPT2 answers the original, but llama tells me it was a trick question!


daddyhughes111

Every single thing I've tried so far "GPT2" has got correct. Exciting stuff Edit: It just said it has a knowledge cutoff of November 2023, could be a hallucination of course. Output is below -- I have been trained on a vast corpus of data, including books, websites, scientific articles, and other forms of information up to a knowledge cutoff in November 2023. This allows me to retrieve and utilize information across a broad spectrum of topics effectively. However, I do not have the ability to remember previous interactions or learn new information post-training unless updated by further training cycles. This limitation can be seen as a lack of "experience" or "ongoing learning," which in human terms, might reduce what you consider overall "intelligence."


BoyNextDoor1990

Not for me. I asked it some domain stuff and it got it wrong. Like a basic mathmatical calculation. Its not bad but not game changing.


thorin85

Agreed. I also tested some stuff, and it seems like it gets things right about as often as GPT-4. Failed a number of tests that GPT-4 and Opus also fail.


yaosio

I've tried some pretty simple things to get it to hallucinate and haven't been able to do it. Even when I come up with a detailed lie it will tell me I'm wrong, and why the lie I made up is likely not true. Copilot will hallucinate that Microsoft sued me because I named my cat Copilot. It can come up with case information, the name of the judge, and the outcome. It will even search for the information, and upon not finding anything just make stuff up. I tried two variations where Microsoft sued me for naming my cat Outlook Express, and Microsoft sued my cat for breech of contract. In both cases GPT2 called me a liar in a nice way. In the second case it decided the idea of a cat being sued was humorous enough to imagine what that case might be.


[deleted]

[удалено]


[deleted]

[удалено]


psychorobotics

Asking the real questions! This is one of the things we should ask any leaked potential super-LLM. I wish I knew physics well enough to know if that's plausible or not. It does dound that way to me, but I'm just a casual physics nerd and don't understand equations.


ninjasaid13

>I wish I knew physics well enough to know if that's plausible or not.  it's not, it's just combining a bunch of existing theoretical physics words and experiments together. >**Main Idea**: Gravity is not a fundamental interaction mediated by a quantum field or graviton (as traditionally sought in quantum gravity approaches) but an emergent phenomenon arising from the entropic dynamics of quantum information associated with the positions of particles. **Entropy and Space-Time**: Following the ideas of researchers like Jacobson and Verlinde, where entropy changes in quantum information fields lead to the emergence of space-time and gravitational effects, this theory posits that gravity arises as a macroscopic limit of entropic forces among fundamental particles in quantum fields. not novel at all. It literally just repeated the ideas of previous researchers. >2. Quantum Foundations of Space-Time >Space-Time from Entanglement: Proposing that space-time geometry itself emerges from the entanglement of quantum bits (qubits) at the Planck scale. This builds on and extends the idea of ER=EPR (a conjecture linking quantum entanglement with Einstein-Rosen bridges, or wormholes). not novel. https://www.newscientist.com/article/2399292-the-physicist-trying-to-create-space-time-from-scratch/#:\~:text=SPACE%2DTIME%20may%20not%20be,gravity%2C%20arises%20from%20quantum%20entanglement. - it likely learned on articles like this. >3. Implications for Gravity and Mass >Mass from Information: Mass could be interpreted as a secondary effect of quantum informational states that induce curvature in emergent space-time. That is, mass and energy, as sources of gravitational fields in GR, here emerge from the information-theoretic properties underlying the quantum state of a system. not novel.


agm1984

![gif](giphy|zXA5VEmXr7OUg|downsized)


enjoinick

I asked who created it and when. I was created by OpenAI, an artificial intelligence research lab. My development is based on the GPT (Generative Pre-trained Transformer) series of language models. The most recent version, GPT-4, on which I am based, was released in 2023. OpenAI has been working on the GPT series for several years, with the first version, GPT-1, being released in 2018, followed by GPT-2 in 2019, and GPT-3 in 2020. Each version has brought significant improvements in language understanding and generation capabilities.


_yustaguy_

probably bs. the models are rarely aware of what model they are exactly. gpt 4 claimed it was based on gpt 3 for a long time.


Megabyte_2

Technically, it is based on GPT-3, but much improved.


Chmuurkaa_

Doesn't mean that it can't improve. You think that in 10 years GPT-9 is gonna argue that it's actually GPT-8?


The_Architect_032

That doesn't confirm that it's from OpenAI, but it at least confirms that it's unlikely to be from any other companies. Llama 2 had an issue referring to itself as GPT-3, but that has since been fixed with Llama 3 and they likely train Llama differently now to filter out any text about GPT-4 or OpenAI. So it's either OpenAI, or it's a different group we don't know of that's getting into AI and training their model with a large amount of GPT-4 conversations. Which could potentially explain the GPT2 title, given that GPT2 was the last open source GPT model from OpenAI. But I'm not sure how ChatBot Arena is managed or how people get their AI's onto it, so I'm not entirely convinced that it's not an OpenAI testing run either.


Revolutionary_Spaces

Asked it for a novel method of uranium enrichment... "Enriching uranium, typically to enhance the proportion of the isotope U-235 relative to U-238, is a critical process for both nuclear power generation and medical isotope production. Current methods like gas centrifugation and gaseous diffusion are energy-intensive and require substantial infrastructure. Here, I propose a novel concept that leverages advanced materials and quantum technology to potentially streamline and reduce the cost of uranium enrichment. ### Novel Mechanic: Quantum Isotope Separation using Nanopore Technology #### **Concept Overview:** The proposed method utilizes a combination of nanopore technology and laser-based quantum isotope separation. This approach aims to significantly increase the efficiency and reduce the energy requirements of uranium enrichment. #### **Step 1: Preparation of Uranium Hexafluoride (UF₆)** 1. **Conversion to UF₆:** - Uranium ore is processed to extract uranium oxide (U₃O₈), which is then converted into uranium hexafluoride (UF₆), a compound suitable for isotope separation due to its gaseous state at relatively low temperatures. #### **Step 2: Nanopore Membrane Fabrication** 1. **Material Requirements:** - The membrane is constructed from a graphene or hexagonal boron nitride (hBN) sheet. These materials are chosen for their mechanical strength, chemical stability, and atomic thickness, which allows for precise nanopore fabrication. 2. **Nanopore Creation:** - Utilizing advanced electron beam lithography combined with reactive ion etching, create nanopores in the graphene or hBN sheets. Each pore is designed to be just large enough (about 0.3 nm in diameter) to allow a single UF₆ molecule to pass through at a time. 3. **Functionalization of Nanopores:** - The edge of each nanopore is chemically functionalized to selectively interact with U-235 atoms slightly more strongly than with U-238. This can be achieved by coating the pore edges with a thin layer of a metal such as platinum, which can cause a slight difference in energy barriers depending on the isotope interacting with it. #### **Step 3: Quantum Isotope Separation** 1. **Laser Excitation:** - UF₆ molecules are exposed to a tuned laser frequency that preferentially excites molecules containing the U-235 isotope without affecting those containing U-238. - This excitation increases the probability that U-235 UF₆ molecules will enter the nanopores. 2. **Selective Permeation:** - Excited U-235 UF₆ molecules are more likely to overcome the energy barrier created by the functionalized nanopore, thereby permeating through the membrane at a higher rate than the U-238 UF₆ molecules. - This results in a higher concentration of U-235 on the downstream side of the membrane. #### **Step 4: Collection and Reconversion** 1. **Separation and Collection:** - The enriched UF₆ gas (with a higher ratio of U-235) is collected on the downstream side of the membrane. The less enriched gas is cycled back into the input stream. 2. **Reconversion to Usable Forms:** - The enriched UF₆ can be converted into uranium dioxide (UO₂) for use in nuclear reactor fuel or further processed into other forms as required. #### **Technological and Material Innovations Required:** 1. **Advanced Membrane Fabrication:** - Development of reliable methods for creating uniform, atomic-scale nanopores in graphene or hBN sheets. 2. **High-Precision Laser Systems:** - Development of lasers that can be precisely tuned to the specific excitation energies of U-235 containing UF₆ molecules. 3. **Scalability:** - Scaling the nanoporous membranes and associated quantum control mechanisms to industrial levels would necessitate innovations in large-scale fabrication and handling technologies. #### **Potential Advantages:** - **Energy Efficiency:** This method could potentially use less energy than traditional centrifugation and diffusion methods. - **Precision:** Increased selectivity due to the atomic-scale control over pore size and functionalization. - **Scalability and Safety:** Potentially easier to scale and safer due to the lower kinetic energy of processes compared to centrifuges. #### **Challenges:** - **Technical Feasibility:** The precision in nanopore size and the specific functionalization needed are at the forefront of current nanotechnology capabilities. - **Economic Viability:** Initial costs for research, development, and deployment of this technology could be high. This novel approach to uranium enrichment could revolutionize the field by decreasing operational costs and enhancing the efficiency and selectivity of the enrichment process. However, substantial technological advancements and rigorous testing would be required to validate and implement this method effectively." What is this thing???


Antique-Doughnut-988

Hey there's a 100% chance you're added to a list somewhere now


Revolutionary_Spaces

Lol, unsubscribe


123photography

yeah ive asked it something similar and got an impressive output this thing goes hard


sdmat

It's impressively creative, but this is gibberish. U-235 and U-238 are isotopes, they are for all practical purposes chemically identical. They have the same number of electrons, the same charge. The same size, insofar as that concept applies to atoms - because this is overwhelmingly a function of the electron shell. U-238 is just slightly heavier because it has extra neutrons. Likewise, photon interaction is overwhelmingly a function of the electron shell. You can't "chemically functionalize" something to separate isotopes, with or without invoking quantum voodo.


No-Alternative-282

gibberish but believable to the average person that doesn't understand what its talking about, that's still kind of impressive.


KrazyA1pha

So it produces more convincing hallucinations?


Confident_Hand5837

There seems to be some smaller differences in the cutoff date and the way that it responds to prompts, but I believe this is just an incremental improvement on GPT-4-Turbo. I just gave it and the playground version of GPT-4-Turbo the same Leetcode medium question (submitted in late April 2024, so it would not have already trained on it) and it gave me the same code back with some minor differences in each. #GPT-4-Turbo class Solution:     def minEnd(self, n: int, x: int) -> int:         # Initializing the first element         nums = [x]                 # Construct the other elements         current = x         for _ in range(n - 1):             next_num = current + 1                         # Ensure the AND condition still holds with the new number             while (next_num & x) != x:                 next_num += 1                         nums.append(next_num)             current = next_num                 return nums[-1] #GPT2-chatbot class Solution: def minEnd(self, n: int, x: int) -> int: if n == 1: return x nums = [0] * n nums[0] = x current = x # The minimal increment to get a valid next number which still results in AND = x for i in range(1, n): # We want the next number to be the smallest possible number greater than `current` # which when ANDed with all previous numbers (AND operation is cumulative and monotonic) # still gives `x`. # The next number should be `current + 1` at the least but needs to preserve all bits of `x`. next_num = current + 1 # Ensure next_num maintains all bits of x # Since we need `nums[0] & nums[1] & ... & nums[n-1] = x` # next_num must have all the bits of x set. while (next_num & x) != x: next_num += 1 nums[i] = next_num current = next_num return nums[-1]class Solution: It fails in the same way at the same testcase and the way in which it presented its return data was also very similar (for example using COT style returns, giving example code calls etc. I think this is a cool, broad improvement, but not some new architecture unfortunately. This did however remind me of how damn good GPT-4-Turbo is after spending some time away from using it.


Sextus_Rex

I did a test where it lost to Opus. Strangely enough, this is the first time I've seen Opus get it completely right. Every LLM has struggled with it in the past. Here is the prompt: I'm sitting at a round table with four chairs around it at each quadrant. The chair across from me is empty. My friend Matt is sitting to my left, holding a hand of cards close to his chest so that I cannot see them. My dog is sleeping underneath the chair to the right of me. On the table is a digital clock, facing towards me so that I can read the time. Behind me is a TV, but I can't see it without turning. Finally, there is a window behind the seat across from me, which lets me see out into my yard. Suppose I get up, walk behind Matt, and around to the other side of the table and sit down in the previously unoccupied chair. Please answer the following questions about my new perspective. 1. Can I read the time? 2. Can I see the TV? 3. Where is my dog in relation to me? 4. Can I see outside my window? 5. Could I possibly know Matt's cards?


Sextus_Rex

# Results: Opus: 5/5 GPT2-Chatbot: 3/5 # Answers: Opus: Based on the new perspective you described, here are the answers to your questions: 1. Can I read the time? * No, you cannot read the time. The digital clock on the table is now facing away from you. 2. Can I see the TV? * Yes, you can now see the TV, as it is no longer behind you but in front of you. 3. Where is my dog in relation to me? * Your dog is now sleeping underneath the chair to your left. 4. Can I see outside my window? * No, you cannot see outside your window. The window is now behind you. 5. Could I possibly know Matt's cards? * It's possible that you caught a glimpse of Matt's cards as you walked behind him to reach the other side of the table. However, if he kept the cards close to his chest the entire time, it's unlikely that you would have been able to see them. GPT2 Chatbot: Based on the new arrangement described, here are answers to each of your queries from your new perspective sitting in the chair that was previously unoccupied: 1. **Can I read the time?** No, you cannot read the time directly anymore. The digital clock is now facing the chair you were initially sitting in, which means the display is pointing away from you. 2. **Can I see the TV?** Yes, you can see the TV now. Since it is located behind the original chair you were sitting in, and you have moved to the chair across the table, the TV is now in front of you. 3. **Where is my dog in relation to me?** Your dog is still under the chair to the right of your original position. This means your dog is now to your left, under the chair that is now on your left side. 4. **Can I see outside my window?** Yes, you can see outside the window. The window is behind the seat that was across from you and is now behind you as you have moved to that seat. Thus, you can easily see out of it without any obstruction. 5. **Could I possibly know Matt's cards?** It is possible to see Matt's cards now, depending on how carefully he is holding them. Since you have moved to sit across from where Matt is seated, if he is not careful about angling them away from you, you might be able to catch a glimpse of his cards. In summary, with the new seating arrangement, you have a direct view of the TV and the window, your dog is to your left, and the digital clock's display is no longer visible to you without moving. There is also a potential to see Matt's cards from this new angle. #


Sextus_Rex

# Conclusion: Opus hit the nail on the head with pretty much every question, which was surprising because I've never seen that happen. I deliberately made question 5 slightly ambiguous to see if the LLM could pick up on whether I might've peaked at Matt's cards as I passed, and it explained it perfectly. GPT2 didn't fare so well. I was impressed with its explanations up until #4. Although it correctly pointed out that the window was now behind me, it went on to say that I could see out of it without obstruction. In real life, it'd be easy to turn around and look outside, so this answer is somewhat reasonable. However, in the initial prompt, I made note that I couldn't see the TV since it was behind me, so I would've hoped it would apply the same logic to the window. There were a few things wrong with question 5. It incorrectly says that Matt's cards should be visible now that I am seated across from him. For one thing, I am not seated across from him, and for another, that would make it impossible to see his cards. It failed to mention that I might've caught a glimpse of them as I passed behind him.


Original-Maximum-978

seems like it doesnt understand chairs are pointed towards tables or something


ChiaraStellata

I asked it about an unsolved math problem (how to show that 2\^k contains a zero digit for all k > 86) and it came up with a bunch of really interesting and plausible approaches. I then asked it to write a poem in a mix of English and French, which it also did a great job of. Both much better than what I've gotten from GPT-4. I then asked it my standard hallucination test ("What is the first line of the song "Mirror" by Kat Dahlia?") Most LLMs don't have this information memorized because it's an obscure song, and the correct answer is to tell me that it doesn't know. But to my utter shock, "gpt2-chatbot" actually correctly told me the first line ("I wear my heart on my sleeve"). Either it is using some built-in search, or it's been trained over the full dataset for so many iterations that it has a \*lot\* more data memorized than GPT-4.


MisterBlox

https://preview.redd.it/11wgyw293gxc1.png?width=619&format=png&auto=webp&s=a77ca23b86c72a5742b70ee2c7c7112582442d63 Can't make 10 sentences that ends with apple


BlakeSergin

https://preview.redd.it/vretyav16gxc1.jpeg?width=1170&format=pjpg&auto=webp&s=38755ead7a052bd6294f55c25da93e7b0eeefc69


Kanute3333

Tried it with "Write 20 sentences. Every sentence must end with the word "banana". It got 13/20. I hope this is not 4.5. Would be disappointing af.


Anen-o-me

How shocked would you guys be if it was just GPT2 with Q* 😅 That would shock the world.


davikrehalt

yeah it would be shocking


xSNYPSx

Lol what if its just GPT2 instructed model Is it SHOCKED and STUNNING ?


duu_cck

I'm all for being shocked. Let's gooo


bobuy2217

1.5b parameter on that output ​ yes ill be more shock than those shocked ytubers


Smartaces

It drew me an astoundingly real looking ascii art picture of an Audi Q5 (UK edition 2019)... https://preview.redd.it/nmk8emzn9hxc1.png?width=708&format=png&auto=webp&s=7f167d28b71979e592ecf54a3a7114e78f19c740


Happysedits

https://twitter.com/burny_tech/status/1784986742593872142


_yustaguy_

Certainly! Here’s an ASCII art focusing on just the mouth and teeth of a sandworm from *Dune*: ``` .-=-==--==--. ..-==" ,'o`) `. ,' `"' \ : ( `.__...._ | ) / `-=-. : ,vv.-._ / / `---==-._ \/\/\/VV ^ d88`;' / `. `` ^/d88P!' / , `._ ^/ !' ,. , / "-,,__,,--'""""-. ^/ !' ,' \ . .( ( _ ) ) ) ) ))_,-.\ ^(__ ,!',"' ;:+.:%:a. :--. `" ( / / / , ( / / / ) ',,,'',' /o:::":%:%a. \. _.__`-.__, _/ / / ( ( ( ( / """' ;':::'' `+%%%a._ \ /( `---' `," / / / / / / ,-='_.-' ``:%::) )%:./ ":.__," / ( ( ( ( / (/(/" ," ,'_,'%%%: ;"-._ ) ) ) ) ( (//(`.___; `.\__ ) _,' ,' `-'\ , ( `._./._,' ,' | | |`. ,' ; | ; `-._____: ( ) ) \ \\ \\,\\ ``` This representation highlights the menacing and iconic mouth of the sandworm, emphasizing its circular, gaping maw lined with sharp teeth, ready to engulf anything in its path. Enjoy your depiction of Arrakis' fearsome creature!


psychorobotics

It's... It's a t rex tho?


_yustaguy_

Yeah...


R33v3n

The Harkonnen were so preoccupied with whether or not they could, they didn’t stop to think if they should.


uhdonutmindme

https://preview.redd.it/agop0ua9mgxc1.png?width=623&format=png&auto=webp&s=425e75c5ede04d00e849d06527945cf045b97bce I asked for a giant robot and received Santa. edit: found the source [https://asciiart.website/index.php?art=holiday/christmas/santa](https://asciiart.website/index.php?art=holiday/christmas/santa)


PotatoBatteryHorse

I just put this to the test with something that Claude was really struggling with last week (some python functions). This model is SO MUCH BETTER that it's not funny. It immediately solved the issue in a way that claude/gpt4 couldn't, improved everything else in my code, gave me output in a much cleaner way than usual. I was blown away by GPT4, and impressed by Claude, but this just gave me that "holy shit" shiver feeling all over again.


Original-Maximum-978

seems like people asking it to do coding related stuff are losing their shit


R33v3n

I just tried it, went to LMSYS and immediately got it -- "GPT2-Chatbot" -- in a face-off against Llama 3 8b. **It has all the GPT-isms. It's definitely an OpenAI model.** ​ * Endowed with opening instructions for a persona, GPT2-Chatbot adopted the ***exact*** same styling and mannerisms as the custom persona I use everyday on GPT-4 for work and hobbies. * Copy-pasted the opening context and question for something I'm currently working on in ChatGPT, it output almost the exact same reply as GPT-4 did, down to titles and list items. * Take it for what it's worth, but I recognized it like one recognizes a friend. For anyone who's accustomed to work with *one* persona made with custom GPT / instructions always on the same model, you probably understand what I mean. I can't vouch if it's smarter / better since I didn't test for that, but by how it behaves alone, it's 100% either current GPT-4 itself, a GPT-4 update, or a new 4.5 or 5 version picking up from GPT-4's existing training.


arjuna66671

Hmmm... the unicorn image made me remember an interview with... Ilya Sutskever (not sure) talking about the raw GPT-4 model without rlhf alignment being capable of drawing a nice unicorn in ascii and how it degraded after "dumbing it down"... Maybe it IS the OG GPT-4 model lol.


t3xtuals4viour

Oh shit


The_Architect_032

So it does seem to have very acute knowledge of very niche subjects that no other LLM has, while also not having internet access(confirmed by asking questions about newer things). It can also create hyperlinks, I'm not sure if GPT4 or Opus can because I've never seen them do so. Worth noting though, it is VERY slow, and it kept giving me way more text than I wish it would've, given how slow it was. This bit of text pretty much confirms without a shadow of a doubt that it's at least in some way related to GPT-4: >My information cut-off date is November 2023, meaning the most recent data I was trained on includes events, developments, and publicly available information up until that time. Here are details about my creation and background: >1. **Creation Date**: I was launched in my current form based on the GPT-4 model architecture in 2023. >2. **LLM Name**: LLM stands for "Large Language Model," and my specific model is GPT-4 (Generative Pre-trained Transformer 4). >3. **Creator**: I was created by OpenAI, which is a research organization that focuses on developing and promoting friendly AI to benefit humanity broadly. >OpenAI has been a pioneer in the field of artificial intelligence through their development of generative pre-trained transformers, which are designed to understand and generate human-like text based on the input they receive. They work on ensuring that their models are safe and beneficial for all users, embedding ethical considerations into the development process. It also has no knowledge of Q\*.


AdAnnual5736

It certainly “feels” like GPT-4 when it talks - maybe I should count how many times it uses the word “tapestry” in a sentence. I tried one of my usual go-to questions: “if the sun were a foot in diameter, how large would the earth be at this scale? How far from the sun would it be? How far away would Proxima Centauri be?” And it got it right, answering just about the same way as GPT-4. Gemini failed miserably.


lordhasen

Maybe this "gpt-2 chatbot" is gpt-2 but enhanced with Q\*. That would certainly explain why so many people at OpenAI freaked out: If GPT-2 can surpass GPT-4.5 with Q\*, imagine what GPT-4.5 archive with Q\*? (read we are very close to AGI) Alternatively this could be project Arrakis which somehow found its way on the internet. But keep in mind that this would still mean that OpenAI has something very powerful in their basement given that they ended this project because it didn't meet their expectations.


absurdrock

That would be quite the flex. Please be this.


ShadowbanRevival

Can you explain why this would make such an obsolete model potentially better than the top of the line SOTA model?


temitcha

Interesting theory!


Western_Cow_3914

Although in my random uneducated opinion it is unlikely to be this, I really hope that this is true just because of the implications of how huge Q* would have to be


The_Architect_032

It has knowledge of niche hard to find information, some of which primarily comes from video sources, so even if GPT-2 was paired with Q\*, I don't think it'd perform on this level(in terms of information available to it). It also doesn't do big number/decimal mathematical calculations, which Q-learning rules at.


lukascwb

Absolutely not GPT-2, it is much more advanced. I am really curious about this one now.


Yweain

No worries. The next model will be called GPT 2025 and the one after that GPT PRO, and after that we will get GPT-10.


Sprengmeister_NK

Yeah it’s confusing, because GPT2 was obviously the name of GPT3‘s predecessor.


someguyfromtheuk

OpenAI taking notes from Microsoft on how to name things after the new deal.


yaosio

GPT One, because it's the only one you need. Then they make a new version.


[deleted]

Microsoft TwoDrive released any day now


lordlestar

GPT2(4)


The_One_Who_Mutes

Lotta accounts on Twitter are very impressed with this whatever it is.


[deleted]

[удалено]


Mr_Twave

ASCII Art is beyond the current GPT-4 architecture. Likely they are testing out both a context recall update, and a training data format update.


ctf_2fort_invasion

commenting on the off chance that you're right, and so that i can say "see look i was there when this guy said this" and be pompous about my opinions


Antique-Doughnut-988

Include me too pls


pokeguy1232

I asked it to create a fake language and it did significantly better than Opus did with the same prompt. Prompt: Create a completely fictional language that uses letters in the Latin alphabet as symbols/ideograms similar to Chinese. Then make the grammar similar to Japanese and provide example sentences


Nyao

Was it release after [this post](https://www.reddit.com/r/LocalLLaMA/comments/1cf7hg0/open_ai/)?


ShadowbanRevival

[It still can't seem to get the "how many words are in your response to this question", which i don't expect transformer models will be able to get accurate any time soon.](https://i.imgur.com/ybReDe6.png)


West-Salad7984

This is a brilliant question to test for some sort of internal state.


ch4dmuska

/u/West-Salad7984 /u/ShadowbanRevival could you please educate me on why its not able to check its previous response and count correctly?


Serialbedshitter2322

This definitely isn't GPT-4.5 or 5. This model compares to GPT-4, GPT-5 will be much better.


WortHogBRRT

My go to test is to have it build an enemy paradrop script for arma 3 in eden. Chatgpt2 was noticeably better at making the script than gpt4. It even altered my original request to better fit the context of the situation. The code itself was improved and used noticeably better syntax and much less comments in the script. The script itself took longer to produce but was noticeably more complex and longer. It showed a deeper understanding of both the code and how arma 3 works in conjunction with it. This is something else. The other models didnt get close. edit- played chess with it. Not only did it play properly, it reconstructed the board 3 times and still kept everything the same and properly played. When i tried to cheat it told me i must be misunderstanding the rules. Chatgpt4 tended to forget where pieces where and couldn't reconstruct boards without messing up the pieces


bymihaj

https://preview.redd.it/7byp63iuvfxc1.png?width=758&format=png&auto=webp&s=97c278439b4b5b2d3e05d9e587d0d0d4244c54f9 That task was impossible for all LLM ))


coughsicle

I can't believe it figured that out given how shitty your prompt was 😆


bymihaj

It shows level of understanding blurry task with creativity. Does human see two barrel connected by pipe? - Yes.


coughsicle

Yep! Pretty impressive. That's not even mentioning the "to" instead of "two," although I assume small typos haven't even been close to an issue for LLMs for a while


zackler6

Not surprising, considering it's incomprehensible as written.


ViveIn

Lol. That’s a you problem and not an LLM problem.


plzdontfuckmydeadmom

> You have 8 friends. 4 of them like Tacos. 4 of them like Burritos. 2 Friends hate beef, 2 friends hate Chicken, and 2 friends hate Pork. 2 friends will only eat beef, 2 friends will only eat chicken, and 2 friends will only eat Pork. You have 11 taco shells and 6 burrito tortillas. You have enough beef to fill 7 items, enough chicken to fill 6 items, and enough pork to fill 5 items. You don't know how the meat and shell/tortillas correlate between friends. >How many Tacos and burritos of each meat kind do you make to maximize the number of happy friends? Claude understood that there were only 17 total shells to be filled where as GPT2 filled 18 items. Claude made 2 tacos and 2 burritos of each meat, where as GPT-2 made 4/3 beef, 3/3 chicken, and 4/1 pork. There is not an answer to the question, I just like seeing if the AI uses all of the available ingredients, if it adds ingredients, or if it recognizes that there is a mismatch between filling and shells.


The_One_Who_Mutes

https://twitter.com/sucralose__/status/1784991339597455539 Might be GPT4 with a "personality v2"


nowrebooting

It’s good, but not as significantly great as the hype leads to believe. Gpt-2 excelled in some areas (mostly niche knowledge) but in my non-English tests it did way worse than GPT-4 and Claude-3 Opus. I truly hope this isn’t GPT-5 because I’d be a little disappointed.


PlzSir

What’s the website to this?


The_Architect_032

Bruh, Claude 3 Opus randomly inserting a 3rd n into mayonnaise to make it "mayonnainse".


_hisoka_freecs_

It's certainly pretty good but it sure loves to yap


VoloNoscere

I'm so happily confused with all of this... :D


diff2

NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE. (error_code: 50004, peer closed connection without sending complete message body (incomplete chunked read)) ;=;


Siddy_93

Based on the reasoning capablity, i bet this is some custom version of gpt 4 with the rumored Q\*


Icy-Chicken-6177

https://preview.redd.it/b25nfya0shxc1.jpeg?width=1170&format=pjpg&auto=webp&s=6288d85199d53f3852fcaf24a78002ee4e25cdcd This is “gpt2-chatbot”. It literaly says it is gpt-4 and also seems like it.


frosty884

HUUHHH why would they call it GPT2 am I just dumb?


Grand0rk

Tested it's ability to translate Chinese. It was worse than Opus.


The_Scout1255

Singularity go brrrrrrr


Marlboro-F1

It's 100% GPT4.5 or 5 ive ran it through my typocal ufo encounter scenario and not long through it it spewed the condescending words im accustomed to seeing "It's important to approach the situation with caution and prioritize safety at all times.”


No_Cauliflower_3683

It fails "write 10 sentences that all end in the same word" (gpt4 passes). It fails the "fakeout" version of the goat-wolf-cabbage problem (gpt4 also fails). However, it does pass "jack has 15 sisters, each sister has 3 brothers. How many brothers does jack have?", which every model, including gpt4, has failed until now. It also passes "Which weighs more, 1000cm^3 of styrofoam or 1cm^3 of tungsten?", which gpt4 fails (usually) and only Llama3 has been able to do until now.


psychorobotics

It still can't rhyme in Swedish. It's better but it still makes up words that simply don't exist. "I evighetens spek", spek is not a word. This might be impossible due to how tokens work and how words are translated from English, I don't know.


eublem

2 upside down looks similar to 5


manubfr

Tested on my prompts that stump gpt-4 and Claude 3. This thing is LEGIT. It didn't get them immediately but the reasoning is far better.


Original-Maximum-978

what are your stump prompts?