| | MARCH 20259CIOReviewFurthermore, all real-world datasets contain both useful information and useless noise. By forcing a model to represent the data with fewer parameters, we force it to learn the information, not the noise. Allowing for more parameters beyond a certain point leads to overfitting: the model learns every little twist and kink in the data, signal, or noise and, as a result, lacks the flexibility to fit data it has not seen before. As we approach the "perfect" model, we can fit exactly all the data the model sees. And the model will be useless when it encounters data it has not seen. This is why the perfect model is not the best; it is like a student who has memorized all the knowledge he was given but cannot solve any unfamiliar problems.What does all this have to do with LLMs and Intelligence?Even though it is a bit more difficult to understand when language is involved, all the above applies to LLMs equally well, at least intuitively. It turns out that LLMs' compression ability is relatively poor: they "fit" a lot of data points (~ the whole internet) but also have a lot of parameters (~ trillions+!). The larger the LLMs we develop, the higher the fraction of the internet "stored" in the LLM parameters and, thus, the closer we get to "overfitting" and memorization.Furthermore, in the hypothetical scenario where an LLM was given the whole internet to learn from, we would encounter the paradox of overfitting not being a problem because... there is no data the model has not seen! And with the model being large enough to retain most of all this data, we would have this enormous "perfect" model of everything: a more verbose copy of the internet...? ConclusionLLMs are marvels of human ingenuity that can deliver significant value to us, not because they are efficient but because they are enormous. That makes them "brute" force, insanely complex models with equally brutally enormous ability to "memorize" information, and this memorization can imitate intelligence. Then again, life itself may have jumped off enormous, complex systems because of emergent behaviors; couldn't real intelligence emerge from these enormous LLMs in a similar way? Well, this is a conversation for another time! To effectively address the potential risks, we must understand the capabilities and limitations of LLMs, both crucial to mitigating the risks of either excessive reliance on AI-generated information or unwarranted fears of automation replacing humans, both of which, though distinct, can have adverse consequences
< Page 8 | Page 10 >