Member-only story

Scientists Trained AI Models Over and Over With Previous AI Content; Here’s What Happened

Onyedikachukwu Czar
5 min readNov 16, 2023
AI models trained with data from previous models will systematically degenerate until they product gibberish. Image Credit: Sakar et al.

Human-generated content is being used to train AI models presently, and the outputs of these models have been dope so far.

But what will happen to the next generation of models if we get to the point where content generated by previous models contributes to much of the content found online and is then used to train future models? Will these future models evolve and improve with the new data sets, or will they degrade?

Scientists have asked the same questions and some insightful research have been conducted in this regard, both for text-based and image-based AI content.

Text-based AI content

In a study published on the pre-print arXiv server, a group of six researchers used a model called OPT-125m to generate some texts on English architecture.

These generated texts were then used to train a next-generation model. The process was repeated with each new generation trained on data from the previous generation.

The result was what is called model collapse — the model outputs continually eroded, until the output by the 10th model amounted to complete gibberish.

--

--

Onyedikachukwu Czar
Onyedikachukwu Czar

Written by Onyedikachukwu Czar

I write: AI | Personal finance & growth | Tech. I sieve the noise and then share with you everything that's left.

No responses yet