Gemma Scope takes a deep look inside DeepMind LLMs

GemmaScope is a series of sparse autoencoders (SAE) trained on the activations of different layers of Gemma 2 2B and 9B

Aug 04, 2024

Google DeepMind has released Gemma Scope, a new set of tools that sheds light on the decision-making process of Gemma 2 models.

Gemma Scope is based on DeepMind JumpReLU SAE, released earlier in July. Sparse autoencoders (SAE) have become a popular technique to look to investigate the behavior of LLMs. Major AI labs are using SAEs as tools to explore the inner workings of LLMs.

SAEs are a variant of the original autoencoder with an additional constraint that force the model to map dense input features to a small number of the intermediate features before reproducing the original input. Sparsity helps interpret the original features. JumpReLU uses a special activation function that finds the best balance between sparsity and reconstruction fidelity.

Gemma Scope is a toolset that contains more than 400 SAEs, which collectively represent more than 30 million learned features from the Gemma 2 models. This will allow researchers to study how different features evolve and interact across different layers of the LLM, providing a much richer understanding of the model’s decision-making process.

DeepMind has released Gemma Scope on Hugging Face, making it publicly available for researchers to use.

The release of Gemma Scope can help in various fields, such as detecting and fixing LLM jailbreaks, steering model behavior, red-teaming SAEs, and discovering interesting features of language models, such as how they learn specific tasks.

Read more about Gemma Scope on VentureBeat

Read the Gemma Scope paper (pdf)

TechTalks

Discussion about this post