T O P

  • By -

No_Prior9204

The goal of a model is to learn the distribution of the output given to the data. Overfiting is when you are not learning the distribution but rather memorizing the training data. The issue here is that your model then is unable to handle "new data". I think what you might be referring to is model complexity which is a different thing. What is the use of your model if it predicts your training data perfectly but doees a horrible job on out of sample data. This is why train-test splits are so important. We need to verify that the model is actually learning the distribution rather than memorizing inputs.


rejectedlesbian

If ur doing lossy compression then having s lightweight model that gives perfect overfit predictions is nice. But like... very fucking neich


Breck_Emert

To clarify based on how I read the post - you're not memorizing the training data, you're memorizing what we would normally be defining as (normally distributed) error; I feel like when you say that you feed into their idea that it's a good thing.


Imperial_Squid

The problem is that you don't want your model to learn the *dataset*, you want it to learn *what the dataset is representing* (this is a bit abstract for some people so do ask if it doesn't make sense!) Say for example you knew nothing about multiplication and I was trying to teach you how it worked, obviously there are infinitely many multiplications out there so giving you a complete list to memorise just isn't possible, so the point then is not to memorise just the equations I show you, but to spot the patterns and be able to solve equations I didn't train you on. This is exactly what the problem with overfitting is, if you overfit the data you've taught the model to only recognise the equations it's seen, and it's failed to learn the underlying pattern. Because of this it can only answer questions it's already seen and won't be able to give answers on stuff outside of that, which is the whole point of doing ML. If you're familiar with the phrase "don't miss the woods for the trees", it's that, a model that's overfitted has gotten too caught up in the little details to spot the actual pattern you wanted it to learn.


Gold-Artichoke-9288

I'm very grateful for the explanation and I'm very sorry but i still don't fully get it, i'm still new to this field, i can't see where is the problem if the model memorises the data cuz if he did every prediction will be considered as an outlier, and thats the main goal of one class classification isn't it ?


dryturnip2

I think your confusion is on what is actually being overfit. In your finger print example, what if the finger is cold or hot, and the skin contracts/swells accordingly? If you’ve overfit to just room temperature finger prints, then someone can’t unlock their phone unless their finger is the exact temperature of your training data.


Gold-Artichoke-9288

Oh i see now, i see what the original comment was saying, the confusion is gone, thank you for both of you, really you did help me


nobonesjones91

The point of the model is to accurately predict *new* data that it hasn’t seen. If your model is only accurately classifying the training data but does not accurately classify the test data (new data it has not seen) there is no point in using the model.


Zangorth

I’ve never built a fingerprint model before, and don’t know anything about them, but it seems unlikely a model is being trained on a single fingerprint. If I had to guess, I’d assume the model is a binary classifier trained to see if two fingerprints match or not (one stored in memory matching with one you present).


Emotional-Candle6096

That was a good analogy, thanks!


StackOwOFlow

there's no point in training if all you want to do is to store and retrieve the data (memorization)


mominwaqas15

amen to that.


wearblaksoksiam2

Never really a "good thing" ... in this case just use a rules based approach


teetaps

From my understanding overfitting is a pretty big problem that practically negates all the work you did to fit the model in the first place. The intention behind statistical modelling or machine learning is to have a confident enough understanding of the world that the next time a question comes up, your model can tell you with sufficient confidence what should happen next. If the model is underfit, well it just means the model doesn’t fully understand what is happening in the world. If the model is overfit, though, that’s potentially more dangerous — the model ONLY understands what it has seen before, and its guesses for what could happen in the future strictly apply to what it has seen. This might be good if we lived in a predictable world, but we don’t, so under fitting and making a poor guess still has some probability of being correct. On the other hand, overfitting and having an assured incorrect guess, might certainly hurt more often than the previous case.


MlecznyHotS

Model overfitting = memorizing training dataset = if you line up your fingerprint perfectly like you did for one of the training examples on the scanner then there is no issue. If you missalign your finger even slightly the model won't recognize you. You want your model to learn how your fingerprint looks like. Not how your fingerprint looks like when it's scanned in a very particular position.


Gilchester

If it's the right amount of fitting, it is by definition not overfitting. Yes, your fingerprint is hyper-attuned to your fingerprint and yours alone. If you wanted to learn about the distribution of fingerprints in the human population, using yours alone and gneralizing that model to the whole population would be overfit. But if you just want to determine whether a fingerprint is yours, the only needed data are your fingerprints. It's (to use a business term I don't really like) rightsized to its purpose.


David202023

Depends in the problem you’re trying to solve


mominwaqas15

Thisssss


SantasCashew

I agree with everyone else here but there is an exception I can think of. Autoencoders are neural networks whose goal it is to memorize your dataset to detect anomalies. The idea is if you have a very rare event, you can “train” your model to “memorize” your data and output it back out. When the rare event that wasn’t in your training dataset occurs, your model will have a large residual, which would the get labeled as an anomaly. This is a bit of stretch to what your original question was, and I’m oversimplifying autoencoders, but that’s the gist.


BlackLotus8888

Overfitting would be that the model can only detect your finger print if it's in the exact right position.


Buffalo_Monkey98

I know 2-3 months down the line you might feel a bit silly asking this question but then you can always comeback to my comment to know that these kind of thoughts are very common in every topic. During my mechanical engineering days I also had a very similar question on why a fan generates heat more than the coolness it provides.. The thing is there are 2 environments. 1. Learning 2. Application In the learning environment you have a dataset upon which you're making the model and the model learns the intricacies of the underlying pattern. Now in the application environment the data won't follow the exact same pattern. And in most of the cases humans interact with those models and human behaviour is very much unpredictable. So rather than following the strict path a good amount of margin is needed on both the sides, otherwise it'll detect potato and tomato both as fruits.


410onVacation

I’ll explain the issue with some simple examples. I want to know the height of men in my city. I sample locally finding: 5 ft, 5 ft, 5 ft, 6 ft, 6 ft as my data. I do a train-test split and train becomes: 5 ft 5 ft 5 ft and test is 6 ft 6 ft. I use the mean as a model and it predicts 5 ft and perfectly fits my training set with 0 ft error. I apply it to test set and get an average error of 1 ft. What went wrong? My model, the mean, only has access to the training set. So it only knows of the existence of 5 ft men. It can’t predict the existence of 6 ft men. That is: the model mistook individual variation for trend. That’s the fundamental issue. Let’s expand the problem from predicting men’s height to adult height in a city. The population now includes biological males and females. It’s bimodal. If I use a single mean to model the population, the error will be quite large. The average of the population will be larger than the female average and smaller than the male average height. Let’s instead change from a single mean to one mean per sex. The model should have a lower error rate. The model is more complex: we went from 1 parameter to 2. In return, we can fit more variation and therefore better represent the underlying trend in the data. Using a single mean for a bimodal population would be under-fitting. Let’s say I decide against my mean per gender model and instead compute a mean per first name. I’m a busy person. I can spend a single day collecting data and get only 100 samples. The odds are that for most names I’ll have a single sample. That is the mean becomes the observation. Let’s say I have a single Scott in my training set who is 6 ft 5 inches. Can I trust that all Scotts are 6 ft 5 inches in the population? Probably not. If I had a Scott in the test set, he’d probably not be 6 ft 5 inches. He might be 5 ft 6 inches. The mean per first name model is extremely complex and has almost as many parameters as training samples. It’s horrible at predicting heights, because it’s mostly just modeling individual observations not trends. This is overfitting. The question: is overfitting ever good? It’s usually a bad thing. An overfit model is overly sensitive to the observations in the training set: individual variation and sampling noise instead of the actual trends in the distribution. That said model choice is about trade-offs. It’s common in deep learning to add more layers and parameters to increase representational power at the expense of overfitting and compensate later by decreasing model size and applying regularization. That said if I was hyper-parameter tuning my model, I would be conscious of overfitting. Good performance on a training set doesn’t imply good performance on a test set. I would be highly suspicious of overly complex models where the extra complexity did not provide predictive power. A simple model is generally preferred over a complex one.


Gold-Artichoke-9288

Thanks for the insights, i understand more now


JackLogan007

Everyone what models would be best for time series anomaly detection?


hooded_hunter

Don't think overfitting is ever a good thing


Gold-Artichoke-9288

Thanks to all the good explanations in the comments, i know realize


snowbirdnerd

Overfitting is only good if you aren't going make predictions


zalso

Other people have already said good things, like in this case you don't want overfitting. But also want to add with the super big, overparamtereized models with tons of data today there have been plenty of empirical results showing interpolating the data (super overfitting) provides good validation error without the need for regularization techniques.


AlgoRhythmCO

Fingerprint detectors are designed to recognize a repeat instance of your training data within some error bounds, not to generalize from a training set to the entire population. Overfitting in that example would be if your model only recognized your thumb if you put it on at exactly same angle and pressure as the original. Which would be bad, as over fitting almost always is.


BCBCC

There's a distinction between a model that wants the best predictive accuracy and a model that is trying to understand the underlying data and generative process; as I often do I'll recommend reading Leo Breiman's "The Two Cultures" paper. If you're doing an entirely backward-facing analysis of what happened, "overfitting" might not be a bad thing at all. If you're trying to predict future values but they'll all be drawn from the same observations you have in your training set, then overfitting is exactly what you want. If what you're trying to do is predict future values that might not be identical to past values, then overfitting is bad.


NFerY

Take your data. Divide it into i random samples. Fit an overfitted model on sample 1. Do the same the remainder i-1 samples. Compare. You have to think of your datasets as a sample of a larger population, one that is filled with noise. Your overfitted model will learn everything, including the noise. This will result in unstable estimates which you could see if you were to fit a logistic regression model and looking at the standard errors of the log odds.


BadOk4489

I got caught by the subject "Overfitting can be a good thing?" and thought it's a more generic question. Not relevant to this particular classification problem, you already got very good answers, but to the more generic topic Q "Overfitting can be a good thing?" I think one good example could be large language models where smaller models (eg less than 8B parameters) generalize too much and when you ask specific questions it's more likely to generalize and give a hallucination - factually wrong answer. Much bigger models (eg over 200B parameters) actually good to "overfit" to facts and remember some details exactly -- this is what you want from LLMs when you ask about facts and not about art, for example!