4 Comments
User's avatar
Andy X Andersen's avatar

A neural net modifying its own weights is likely not going to work well for very large and very finely-tuned networks. It will also be immensely expensive.

Identifying a subset of weights that can be made tunable is likely going to be very task-dependent and won't generalize.

I think it makes more sense for an AI agent to work hard at inference time, when the solution is not known, and save the final streamlined path to the solution, the way people record their own eventual polished strategies.

That way the AI agent builds a vast trove of recipes it discovers that can later be either fetched dynamically when dealing with similar problems, or baked in a future large training run.

Expand full comment
Ben Dickson's avatar

The idea behind Transformer^2 is to have a set of vectors that represent independent skills. For each task, one or more of those vectors is adjusted and combined with the weights during inference, like LoRA adapters.

Expand full comment
Andy X Andersen's avatar

I am not sure a skill people have can be represented as a vector, with different vectors for different independent skills.

Looks like our skill set is a continuum, just as as the problems being solved form a continuum.

SVD and linear algebra seems to be very coarse methods, especially given the high dimensionality and nonlinearity that is encountered with Transformers.

Expand full comment
Ben Dickson's avatar

Agreed. The idea is nice but I'm not sure how practical it can be at scale. In their defense, they don't claim that there is a 1:1 relation between skills and z-vectors but rather that each vector represents a component that can control some aspect of the model's behavior. The learned classifier that processes the prompts at inference time maps each request to the set of corresponding z-vectors that can influence it the most.

Expand full comment