A dataset to automate programming tasks

Last week, IBM Research released Project CodeNet, a programming dataset with 14 million samples. CodeNet is meant to train machine learning models that automate programming tasks.

While machine learning is nowhere near replacing programmers, it can become the basis for many tools that can make programmers more productive. The dataset is very well annotated and can be used to develop different kinds of ML models. Some potential uses for CodeNet include the following:

  • Translation between different programming languages

  • Advanced recommendation and autocomplete

  • Code optimization

  • Code generation

