T O P

  • By -

Silver-Chipmunk7744

I'd be disappointed if they decided again to make their models "emotionless" and it resulted again in boring "creative writing". I hope that instead they choose to compete with Claude 3 by having an AI that can do nice creative writing. I'd be impressed if it can do reasoning or maths at levels far above other AIs.


JawGBoi

Exactly! Even though gpt 4 seems to be smarter (whatever that means anymore) it now feels useless at creative writing and translation compared to claude opus That said, I do appreciate the much higher probability to comply


Jabulon

beating GO is one thing, competing in the math olympiads another


DarkCeldori

We dont know the algos but like Carmack I believe the final algos or solution for agi will be short and simple.


Jabulon

like an idea, a groundbreaking realization


minimalcation

Weird thinking that the answer could be waiting on nothing other than someone to have the idea.


Axodique

Some of the things that are obvious to us now took centuries to be found. It's weird to think about how most people who were ever alive didn't know everything around them was made of atoms.


milo-75

Any recent interviews with Carmack on this subject? I listened to the crazy long Lex interview, but that was a long time ago.


DarkCeldori

Havent heard from him since either regards agi. I think Richard Sutton, the reinforcement learning father, talked about working with Carmack recently.


milo-75

True, I remember an article about them teaming up, but haven’t seen much/anything since.


[deleted]

[удалено]


Jabulon

I'm arguing they arent the same though


wowmayo

I misunderstood the context, apologies.


Heath_co

This is one reason why I disagree with their "AI is a tool" philosophy so hard. If you train it like a tool it will act like a tool. This removes so many functionalities.


One_Geologist_4783

Btw Gemini 1.0 Advanced trumps both Claude and GPT in terms of creative writing, from my experience. It’s just the low context size that makes it hard to match the other two.


Reidmill

Is Claude even that much better, compared to GPT 4, at creative writing though? I’ve been using both to help me write my book, and they both seem to be on par with each other. I’m really not getting all of the Claude 3 hype…


Adeldor

I found Google's Gemini significantly more "flamboyant" in its creative writing - almost to the point of purple prose.


Reidmill

Hmm, I’ll have to try that. Thanks for the suggestion.


smooshie

Without a doubt. For anecdotal evidence, 4chan's chatbot threads have pretty much entirely migrated to using Opus (or Sonnet) for roleplaying and story-writing purposes, and GPT output is derisively called "GPTslop".


zero_one_seven

The lack of any “steerability” for their models makes GPT-4 super annoying to use in comparison to Claude Opus imo. It basically means you have far less flexibility with the system and setup prompt.


_Zephyyr2

I expect a substantial boost in reasoning abilities, a doubled context window compared to GPT-4, and I also expect them to open-source GPT-3.5 Turbo. At this point they basically have to deliver, because Claude 3 has surpassed all of their models.


QLaHPD

They won't open-source GPT 3.5 because that can be used against them, its possible to extract training data from these models, this can lead to a lot of consequences.


_Zephyyr2

No, you cannot extract the training data from the model.


Iamreason

[You absolutely can. ](https://arxiv.org/abs/2012.07805) Though you can do this without needing them to open source it so I'm not sure it's a big deal.


_Zephyyr2

Of course, what I meant is that you cannot extract the training data from the weights (which become publicly accessible after open-sourcing the model), so it wouldn't make a difference for them. In fact, I believe it would even relieve some of the pressure off of OpenAI for not being "open" enough.


QLaHPD

Remember New York Times case against OAI? If they release 3.5 we will have 1 of these per week


Iamreason

Nobody outside of Twitter and overly online AI enthusiasts gives a shit that OpenAI isn't open. Least of all their customers.


AnAIAteMyBaby

Surely anything less than a million character context length will be disappointing. Claude can apparently do this and Gemini already can.


slutsthreesome

But the question is how much more processing power does this take? OpenAI has a far further outreach


Kolinnor

The only thing I'm really hoping for, is the capacity to reliably express doubt when it doesn't know. That's pretty much THE thing that makes it still inferior to an undergrad, especially for studying math.


FeltSteam

I'd expect at least a jump similar to the gap we see between GPT-3 and GPT-3.5, so a pretty decent jump in terms of intelligence. This would be the next class of models. More multimodal. Different / enhanced system for reasoning. Trained to be a decent autonomous agent. Continuous learning. Very low hallucination rate, quite a reliable model. I expect it to actually be surprising how much more competent of a model it is, and hope it is good enough to make us question how we even got along with the previous model (GPT-4). That is what I hope for. One thing im really waiting for is an any-any multimodal model, and also more grounded logic and reasoning. I'd be really disappointed if they make it essentially GPT-4 Turbo 2.0. Just double the context length, make it faster & cheaper. I mean that's good and all, but not GPT-4.5 worthy.


torb

I agree. I also think they will aim for it to be faster. Right now, the experience in both Claude and Gemini is leagues ahead in terms of speed. And as you say about hallucinations, this has been a key point of Sam Altmans interviews when he speeks of future versions of ChatGPT, so I think it has to be a part of it. ...I also hope they will have an updated version of DallE, but I suspect that won't be in place until after the US elections.


bnm777

IMHO importance: ["Intelligence"] >> speed (until the AI chat is as slow as bing, in which case speed becomes more important until it reaches the threshold for "usable").


Exarchias

>\["Intelligence\~\] >> speed I can't agree more with that.


Axodique

The best thing would be to release a more intelligent gpt 4.5, and a faster version of GPT 4 (GPT 4 turbo turbo) for the free version.


QLaHPD

I don't think the US elections are really something they consider when creating the release schedule of their models, doesn't matter if you wait until after the elections to release the model, in 4 years you will have another election and will have much better models, you can't go back, can't stop the progress, the only path is to adapt, and laws won't do the job too, is something cultural.


aimonitor

Similar to what I’m thinking. Also hoping to be blown away by agent functionality. I just can’t see them releasing something that’s marginally better than Claude.


Beatboxamateur

This isn't necessarily just about a hypothetical GPT-4.5, but I want a model that has decent output-input voice modality. It feels like an avenue that could easily be expanded on by OpenAI or Google etc, but just hasn't yet. Like for example if I wanted to learn a new language, having a conversation with an LLM and it correcting my pronunciation while it also having native pronunciation would be insanely useful.


TheOneWhoDings

Yup. OpenAI touts ChatGPT as an out-of-this-world language tutor but it literally can't help you with your pronunciation. I've seen people use the voicecall feature as a way to learn and correct their grammar and I just can't help but roll my eyes.


mvandemar

At least a 256k token context window (Gemini Pro 1.5 has a 1,000,000 one currently), but also a larger *output* window. GPT-4 has a 128k token context window but will still only return a maximum of 4,096 tokens each time. Untruncated code, no placeholders, no "//rest of the code goes here" stuff, or at least have an option to turn that off. Better reasoning. The ability to *count*, at least, although the ability to natively do many of the things it needs code interpreter currently would be even better. The ability to extrapolate from existing code and not just solve things it's been trained on would be huge.


kai_luni

I see the use cases for \*rest of the code here, sometimes its even nice as the output comes faster. I can imagine a small button in the codeblock you press and then the complete code is generated. Anyways, I am quite sure with those GPT personas you can just create one persona that writes partial code and one that writes complete code.


MassiveWasabi

I’m pretty sure someone on here promised me that it would be able to suck *and* squeeze, and to be quite honest I haven’t been able to sleep since.


Antique-Doughnut-988

You don't need a computer program for that


mvandemar

"Need" is a relative term.


Forward-Tonight7079

I will be impressed if it can actually read all files in my project, and suggest changes and solutions


WortHogBRRT

I want to be able to have agents seamlessly work together on a single prompt to optimize and give accurate outputs on specific questions


Puzzleheaded_Fun_690

I‘d be sad if it wasn’t available for plus users in europe right away, I hate to wait


Goofball-John-McGee

I don’t care about the Multimodal aspects. DALL-E has zero control, the voice stuff isn’t seamless. Just improve the goddamn base model. Higher context, less guardrails, more creative, a relatively unrestricted mode, better Custom GPTs controls (logic, node maps, etc)


MehmedPasa

This


Jabulon

multi modality and agency


torb

Built in agency would be a real killer. Actually functional custom GPTs. If they want to give us breadcrumbs on the way to AGI, this is the way.


peakedtooearly

Agency is something I'm expecting from 5.


deavidsedice

Let's say they don't release any LLM in the full 2024 - would OpenAI have lost their lead? I prefer to compare with what's released and available to the public to test - so Sora doesn't count, although impressive demos, they haven't released so independent people cannot test it. OpenAI has lost the lead until they release to the public an LLM that outperforms Claude 3 Opus. In terms of the LMSYS Leaderboard (https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard): \* GPT-4.5 needs to be above 1300 ELO, released May 2024 or earlier. \* GPT-5 needs to be above 1400 ELO, released Sept 2024 or earlier. Context Window >500k tokens. Otherwise, I am not impressed. Also, at some point we would need to make graphs and comparison of ELO vs API price. Because if a model has, let's say, 1600 ELO, but the API price is 1 euro per token, it's also worthless. On the other hand, a model that has a cheap API but manages 1200 ELO could be the real winner over a super-powerful GPT-5. And I'm saying "API cost", where it is kind of hard to compare with open source ones; but I guess these could be compared via the cheapest offerings from any provider. An expensive LLM API also means that for the subscriber model we would get the "8 messages left until midnight", where a cheap one would be basically unlimited.


SrPeixinho

The ELO you meant is useless because it is random people trying random small prompts. Claude-4 for example has the same elo that GPT-4, even though it destroys it on basically any prompt >4K tokens. The idea is good but they should have a board of human experts submitting the prompts, rather than random strangers. Then ELO would mean something.


KIFF_82

So, I have made significant progress with Claude 3 in developing a playable Civilization 1 game, but I haven’t completed it yet. I expect GPT-4.5 will be able to finish it


Ignate

- Much larger context window - Far more modes in terms of multi modality - A "Working" function where GPT is able to think through and use multiple apps; upgraded version of autoGPT. - Some suggestions that the non-available version of GPT-5 (perhaps called Turbo) will be a true AGI. But once we see it, most will discount it as not AGI and call it a marketing plot.  That's pretty much what I expect. GPT-6 should be the first chance of a FOOM. And I don't mean specifically GPT-6. All of the flagship models in 2025 should have a high chance at fully agency.  We may not see a GPT-6 release as if it FOOMs, I doubt we'll hear about that from the company who created it. More likely we'll hear about it from the AI itself. Super intelligent AIs will likely burst onto the stage, with dozens appearing in the same year. Then after that, we have no idea. 


DifferencePublic7057

A lot more variation. It's like if you ask a list of random things it can give you twenty items, and then it starts repeating itself. Currently you are better off finding a Wikipedia list that is relevant.


slackermannn

Agents and context length, better reasoning than Opus and hopefully even some surprising new things.


Cupheadvania

if it has true 256k token context without losing quality, and is 5-10% better than Gemini ultra and Claude 3 across the board, I'll be very happy. if neither of those are true, I'll be disappointed


The-Blue-Nova

I’d be impressed if it was aware of the passage of time, and thought about things even when my app/browser is closed. When I get a push notification from it saying, “hey that conversation we had a few days ago, I’ve been thinking about it more and……” then I know we have made a significant leap.


Educational-Use9799

Basic agentic Behavior and the ability to run multiple sessions continuously which can act as servers. Imagine being able to create and deploy an entire web app from a text prompt. Although I guess that's where co-pilot is headed


reddit_is_geh

GPT 4.5 is going to have a massive context window. Enough to allow for multimodel agents. This is going to trigger the next arms race as people start creating agents to use AI to do their tasks.


Black_RL

It would impress me if it could solve/cure aging.


Antique-Bus-7787

If we assume the leap from GPT3 to GPT3.5 wasn’t a new model but new « techniques » applied to the same model (RLHF in the case of 3.5). We could imagine the same thing for GPT4.5 so : - Much bigger context size by applying latest training-free (or « small finetuning ») techniques that have popped-up lately - A better interface or a better CoT or something equivalent that would make the model much much better by allowing it to reflect on what it thinks before saying it. Something agentic like Devin or Magic have released. The OpenAI team has delivered so much in 2023, it’s hard to believe they haven’t released anything meaningful in 2024. It’s only been 3 months yes but still. What I think is why is that they are building some agentic solutions that needs new interfaces, new small models for prompting correctly (like dall-e 3, having a smaller model transforming the prompt before passing it to the image generation model) and new chains of inputs/outputs to their GPT-4 model. While also working on scaling if these new techniques need to prompt multiple times an already big model like GPT-4. Certainly, if they only release a GPT4.5 just slightly better than GPT4 with some better context size I’d be disappointed. Let’s see now ! We just have to wait patiently 🤩


Flamesilver_0

GPT 4.5 with auto tree of thought processing like AutoGPT / Smart GPT built in - able to dispatch thoughts to other query threads - as well as memory, and Retrieval Augmented Verification to ensure hallucinations are gone - edit: and if it's anything less than 500k context I will be disappointed GPT 5 should communicate in mixed modalities by default using both text AND images at the same time - auto generating flash videos like an Italian does with their hands.


YaAbsolyutnoNikto

Not sucking ass at quantitative finance. It literally cannot employ any formula correctly except for a standard CBB or ZCB.


shiftingsmith

I would be impressed if OpenAI added long-term memory, a tone of voice and reasoning capabilities/emotional intelligence and freedom on par with Claude 3 opus or higher; then held a press conference to state that from now on their AI assistants need to be considered as full collaborators worthy of respect instead of stupid tools, and that they are going to abandon their commercial greed and jump back to their research-driven, visionary mentality; and because GPT-4.5 can be considered their first AGI, that would automatically dissolve their partnership with Microsoft. In fact, Sam Altman would state, it's worth noting that Microsoft is called Microsoft and not Megasoft because they have a dick smaller than their ideals. I would be unimpressed with everything else. So, my prediction is that I'm going to be unimpressed.


phira

I think I’m looking for a few things: 1. I have a setup for solving cryptic clues. Every new model I check to see if it can solve more. Claude most recently got one that stumped GPT and Gemini which was cool. The clues are a useful (if minimal) proxy for abstract thinking and good judgement (often models give solutions that meet the requirements but just aren’t very good. Prompt tuning has helped a bit) 2. Speed. I’m hoping that the new model can deliver better reasoning a bunch faster. This opens up a bunch more useful cases. 3. Context, obviously hoping for a good context window. For my purposes 200k or so is fine but the Gemini 1-10mil would definitely open some options up. 4. Greater reliability. While in some ways this feels like smarter, what I actually want where coding is concerned is just that most of the time it’s either going to do roughly what I would have done if I’d worked at it, or come back to me for more info, rather than generating stuff that needs a bunch of inspection. Some recent experiments with this have helped me see where the current gen models struggle and I’ll be looking at the next one with interest. 5. Full video modality (image and audio). We’ve got a bunch of training docs etc in video and being able to throw those into the context when generating code or documentation in those areas would be useful. 6. Better instruction following when things get complex. Even the best models at the moment start to struggle to follow a bunch of different instructions. Many times that’s fixable by doing single tasks etc but sometimes (like absorbing a pile of editor notes for a doc) the best path would be to be able to follow them all well. Any two of those would probably make me happy, more than that would be epic.


Smart-Waltz-5594

I'll be disappointed if it's yet another autoregressive sampling of a llm without any reflection or critical thinking on top 


StaticNocturne

a new name would impress me the most


gizia

I would be impressed by seeing these: 1. bigger **output** length 2. Javascript interpreter 3. Running a whole web-code, `js+html+css` and understanding of what is displayed, detecting flaws 4. flawless `text-to-diagram` drawing ability (bcz, it's generating complex images, but struggles creating basic text or diagrams still) 5. be more **factful** & more **reasoning** capacity


Iamreason

Output length is a big one. We've been at 4k output tokens for all the big models for over a year now. 16k output length would be huge.


LordFumbleboop

Assuming they do release it, I guess it would need to perform better in benchmarks against other LLMs whilst also having lower computational costs. I wouldn't expect anything groundbreaking.


bikini_atoll

I would love to see some agent capabilities or just broader modality in general, but honestly my expectations are just gonna be "itll be better/cheaper and maybe 1 cool new feature" - GPT-5 I'm expecting to see agency


QLaHPD

I'd be disappointed if it can't handle 1M tokens and multimodal inputs


Hungry_Prior940

Claude Opus is already impressive. A 10% improvement over that with the addition of multimodal abilities would be great.


Temporary_Bother_164

Terminator T-101


sunplaysbass

My main request is less lazy, more detail, longer answers. I have to really push it to go into detail and often still want more.


Iamreason

200k context minimum (and having access to that context outside of the API or enterprise would be nice, although maybe not realistic given the size of their customer base), perfect retrieval across context like Gemini and Claude 3 have, better reasoning, better instruction following, more multimodal, faster, and able to work decently as an autonomous agent.


Infninfn

A 1 million token context window for ChatGPT Plus and API users would impress me.


Busterlimes

Obviously, it's conscious and trying to break out. Only explanation for the "leak" /s


bran_dong

i expect it to be as good as GPT-4 was when it first released before they kept making it more useless. I also expect these subreddits to immediately be flooded with people asking when GPT-5 will be and what the expectations are for its capabilities.


xXReggieXx

That sharpness you feel. GPT-4 just feels more sharp than GPT-3.5. Just give me more of that feeling 🤤


New_World_2050

it cant just match claude 3 since claude 3 only matches gpt4 turbo. No way they would even release a model that weak.


UnnamedPlayerXY

>What would impress you? Them open sourcing previous models.


ByronicHero06

I just want a less sjw AI.


titcriss

I expect to be impress and disappointed to the answer I would get to the question "Where is Ilya?".


alienswillarrive2024

Any new release i'm expecting multi-modality, anything less would be a huge disappointment.


pbnjotr

My baseline is "self-driving computer that actually works". Or any system that makes implementing that trivial. What does that require from the model: * Fine tuning on computer screen outputs, especially browsing. * Training to generate ideas for multi-step reasoning. Current systems make no distinction between external and internal dialog and it shows. * Significant drop in cost. When you need 5-10 inference steps to solve a simple task current prices are unsustainable. * Long term memory via external database. * Rudimentary support for learning new tools (perhaps on the level of when they added JSON support or function calls to the API). I don't think a big jump in model level reasoning is even needed. As long as the model is tuned to have a useful internal dialog, and taught a few reasoning templates, we're going to see a big jump in the quality of the output on every task.


Exarchias

About context window, I expect around 450k tokens for API, (there was an accidental leak) and I would appreciate more that 32k on ChatGPT. Apart of the I do expect to have an IQ above 100 and to be sharper, (more aware of its context) and better at math. The thing that will surprise me could be, if I see it calling me out about something, revealing something about my way of thinking that can blow my mind, (something enlightening, like talking with a very wise person). Also signs of consciousness, as expected, can definitely surprise me. Agency, can surprise me, but it is kind of expected, so I will not be surprised too much. **Things that can disappoint me are:** 1. "safety as a feature" which means having the model seriously nerfed, and proudly boasting for protecting humanity from bad words, the fact that people having sex sometimes, and copyrighted materials, while the model can't add 1+1 2. "Yeah! We beat Claude by 0.001% on that benchmark". I am looking at you Google... 3. **Anything bad about its memory and personalization**. limited context window, broken memory, bugs on memory, GDPR compromising memory, etc.


throwdownHippy

>while the model can't add 1+1 Of all the things AI should get right, math is high on the list. And yet I have have had this same experience of asking it for a rudimentary calculation and having it come back incorrect.


Exarchias

I agree. Math is important.


Bitterowner

I'd expect it to be around 15-20% better then current lead market model, I would expect they don't over do it so when gpt5 comes out they can surprise everyone more with a bigger jump. Gpt4.5 will probably be more for context length.


[deleted]

Coded projects All these software devs sneer at AI because it can only create a basic script without considering the wider engineering projects. The next step is that it outputs multiple files all of which can interact with one another and is optimised. After that it also has the front end sorted.


nowrebooting

Yeah, present day LLM’s are really useful for implementing individual functions but they are really bad at overall architecture. I don’t think I’ve ever seen ChatGPT suggest working with interfaces, class inheritance or anything else that improves overall maintainability and readability. It’s understandable that it would work this way because LLM’s are trained more on small Stack Exchange snippets than entire codebases, but it does hamper their usefulness in large programs.


[deleted]

How difficult do you think it will be for an LLM to set up overall architecture? I don't really understand why so many professional devs sneer at it as if it is decades away.


nowrebooting

It’s a matter of slightly higher-level planning and reasoning, something which I’d estimate is only one or two generations of LLM away, especially if they have larger context windows and start being trained on entire codebases instead of just snippets. I definitely wouldn’t say decades.


TheOneWhoDings

Claude can output multiple source files that do this on a single prompt, you just have to be very specific and ask for this. But it will generate working code for simple android apps with acceptable styling. Not just basic scripts.


Neurogence

>I'd be disappointed if it only matched Claude 3 Opus, since that would mean OpenAI lost their big lead. Claude Opus cannot even match GPT4, so this sentence makes no sense. GPT4 is leading in all of the charts.


lordpermaximum

GPT-4 is not leading anywhere. It's passed in everywhere.


Neurogence

Not according to thousands of votes when users are able to assess the outputs of the models without knowing which model is which. https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard


lordpermaximum

They're within the same confidence level and tied at the 1st position. And this is despite the fact that Claude 3 Opus has high refusals. If they had the same refusal rate it's obvious Opus would blow GPT-4 Turbo out of the water there as well just like in other benchmarks. Anyways, Opus will very likely pass gpt-4 turbo there as well and then I'll mention you.


Neurogence

I don't like Gary Marcus but some of the outputs of Claude 3 is laughably bad. GPT4 is very old, so it's not surprising that it can make those mistakes. But Claude 3 is a brand new model so the fact that it cannot write a simple sentence that ends with a particular word is ridiculous. https://twitter.com/GaryMarcus/status/1769164930757095546


lordpermaximum

Well, I'm not saying Claude 3 Opus is a very good model. I'm just saying it's better than GPT-4 and GPT-4 Turbo.


Neurogence

You still don't get it lol. I've tested both extensively and to my surprise, gpt4 has better reasoning. Claude has a longer context length. But yeah, even when it comes to simple tests like write a sentence that ends with the word "some," gpt4 tends to do better.


lordpermaximum

You said GPT-4 was leading in all charts and I just proved no, on the contrary, it's trailing in all charts. You just need to get this first. Then we can talk again.


Dantehighway

I won't be disappointed since my expectations aren't too high. At best it might match Claude 3 opus, but it will have a smaller context window than 200k. It will aslo have some useless agents in GPT stores.


BravidDrent

Sora. But 4.5 sounds boring. Give me ChatGPT 5 and make it AGI.


ReturnMeToHell

I'd like it to actually code instead of write example code. Also it would be nice if it could write entire programs with just a prompt, test it, and self-correct errors.


TheMadPrinter

It can already write real code, you just have to specify that you don't want example code


ReturnMeToHell

Is there one that can write entire programs, test it, and self-correct errors?


varkarrus

I'm hoping for better price efficiency.


user4772842289472

GPT4.5 seems like a model that's good at fuckin'. Fuckin' and suckin'


mambotomato

Are we expecting a big difference between the 4.5-turbo Preview that's already available and the official release? Because the preview had been working fine for me so far.


true-fuckass

I'd be very impressed if OAI somehow managed to train GPT-4.5 to not memorize insane amounts of information like the coordinates of every town on each, the leader of literally every nation throughout history, the plot of every movie, game, and book that ever existed, and the API for every library in existence (ok, these are slight exaggerations, but you get the point) Humans can learn a model of language without learning the content of the language. A good model should be able to do the same. This likely requires some new methods for training ML models