2 Comments
User's avatar
Sahar's avatar

Interesting concept, I was also thinking that as AI advances we should "stop holding its hand" as you put it.

Expand full comment
Andy X Andersen's avatar

"Without the initial SFT warmup stage stage, RL training did not achieve desirable results."

Indeed. RL can go beyond imitation, but left to its own devices can go wild. So, adult supervision required, but not helicopter parenting. :)

Expand full comment