Common mistakes that can ruin ML projects

Some hard-earned lessons from the trenches shipping ML products.

Dec 18, 2024

Working on machine learning products comes with unique challenges. Given the nature of ML systems, which focuses on data and algorithms, I’ve seen product managers often get caught up in the technical details of getting the metrics right and forgetting about many of the important things that can make or break a product.

ML product managers must make sure that the team takes into consideration the broader setting in which the model will be deployed. Here are a few examples of how ML products can go wrong and how to solve these challenges.

Technical considerations

Having the best-performing model will not serve you well if it doesn’t meet the technical requirements of the product’s environment.

For example, one team was working on a product that required near-real-time object recognition. They had trained a model that had very high accuracy, but they had not considered where the machine learning model would run. If it would run on the cloud, they would have to consider possible network outages and the delay caused by the roundtrip of sending the data to the server and receiving the response. They also hadn’t considered the scale issues with having thousands or possibly millions of people using the product. Would their servers be able to handle the load?

On the other hand, if it were going to run on an edge device such as a mobile device, would the device have the necessary resources (memory, compute, battery, etc.) to run the model? It turned out that they needed near real-time on-device inference but their model was designed to run on resource-heavy servers.

Practical considerations

The focus often becomes mostly on the prediction and less on the intervention, which is arguably the more important part of your product.

For example, in one project, the team was creating a customer churn prevention system. They had trained a machine learning model that predicted with 90% accuracy which customers would churn within the next month. The sales team was then supposed to use this prediction to reach out to candidate customers and ask them to do an interview and offer them a discount.

However, after they talked to the sales team, they realized that one month was too little time. The team needed at least two months to be able to turn the customer around. Therefore, the model was not very useful and they had to restructure their dataset and retrain the model.

Legal and regulatory considerations

Data scientists can work on projects without considering the legal implications of deploying their models. Does the model or the data used to train the model have any licensing limitations that prevent them from being used in commercial settings?

For example, in one project, the machine learning team fine-tuned a language model based on data from ShareGPT. The model performed very well, but it had to be scrapped because the terms of service of OpenAI prohibit the commercial use of models that have been trained on data collected from its models (ShareGPT data comes from ChatGPT). Always make sure you have full knowledge of the license permissions of the models and data you use in your products.

In other cases, the domain and industry you will be deploying the model will have strict requirements that might not be met by the models. In another project, the machine learning team was working on a deep learning model for evaluating loans in a financial product. However, according to the regulations, if they rejected a loan, they had to be able to explain the reason to the customer. The deep learning model was not useful because it was an uninterpretable black box. The team ended up training a decision tree–based model that was not as performant as the DL model but was fully transparent and explainable.

Always make sure to understand the regulatory environment of your application and industry and to communicate it to the machine learning team.

Financial considerations

One of the things that often falls through the cracks is the costs associated with using machine learning at scale. Will it be cost-effective? How does it affect your unit economy? For example, say you’re deploying an LLM-powered assistant in your e-commerce application to help users fill their shopping carts through a conversational interface. What is the cost-per-token of the model you’re using? What is the average number of tokens per conversation between the customer and the assistant? What is the average revenue gained from an AI-assisted shopping experience? Does it cover the costs of the model? If not, can you adjust your application to use a less expensive model?

Timing considerations

Say your ML-powered product or feature aims to meet a new safety regulation that will become effective in three months. Will you be able to train, test, and deploy the model at scale by that time? Timing considerations can have important implications for the type of model you use. For example, time shortage might force you to consider using ready-made online services instead of training your own models (with the option of creating your own models in the future).

As the product manager, make sure to have a full understanding of the entire scope of the product and its environment. Talk to stakeholders and subject matter experts, discuss the non-technical aspects of the product with the machine learning team, and make sure everyone understands the constraints and goals.

If you want to learn more about ML product management in general, you can try the AI/ML Simulator for PM course by GoPractice. It is a fantastic, hands-on experience. If you’re interested in learning to create LLM applications, their GenAI Simulator course gives you the perfect framework to think about generative AI and what kinds of problems you can solve with it. I highly recommend both courses.

If you want to develop an ML or LLM application for your organization but don’t know where to start from, contact me.

TechTalks

Discussion about this post