A gentle introduction to transformer networks

May 02, 2022

In recent years, the transformer model has become one of the main highlights of advances in deep learning and deep neural networks. It is mainly used for advanced applications in natural language processing. Google is using it to enhance its search engine results. OpenAI has used transformers to create its famous GPT-2 and GPT-3 models.

Since its debut in 2017, the transformer architecture has evolved and branched out into many different variants, expanding beyond language tasks into other areas. They have been used for time series forecasting. They are the key innovation behind AlphaFold, DeepMind’s protein structure prediction model. Codex, OpenAI's source code–generation model, is based on transformers. More recently, transformers have found their way into computer vision, where they are slowly replacing convolutional neural networks (CNN) in many complicated tasks.

Researchers are still exploring ways to improve transformers and use them in new applications.

Read a brief explainer about what makes transformers exciting and how they work on TechTalks.

For more explainers:

TechTalks

A gentle introduction to transformer networks

Discussion about this post