Member-only story
Scientists Trained AI Models Over and Over With Previous AI Content; Here’s What Happened
Human-generated content is being used to train AI models presently, and the outputs of these models have been dope so far.
But what will happen to the next generation of models if we get to the point where content generated by previous models contributes to much of the content found online and is then used to train future models? Will these future models evolve and improve with the new data sets, or will they degrade?
Scientists have asked the same questions and some insightful research have been conducted in this regard, both for text-based and image-based AI content.
Text-based AI content
In a study published on the pre-print arXiv server, a group of six researchers used a model called OPT-125m to generate some texts on English architecture.
These generated texts were then used to train a next-generation model. The process was repeated with each new generation trained on data from the previous generation.
The result was what is called model collapse — the model outputs continually eroded, until the output by the 10th model amounted to complete gibberish.