Is cosine similarity the right measure for embedding models?

Mar 21, 2024

Netflix has done some of the most relevant work in ML-based recommendation systems.

A new paper, based on internal research on recommendation systems at Netflix, highlights the limits of using cosine similarity in measuring the proximity of objects.

Cosine similarity measures the dot-product of two normalized vectors. By normalizing the vectors, you are discarding their magnitudes. How much does this affect similarity results?

The study suggests that in some applications, it can result in arbitrary and meaningless similarities, opaque and non-unique results.

The researchers caution against “blindly using cosine-similarity” and suggest several remedies to get better proximity measures from embedding models.

Read the full analysis on TechTalks.

For more on AI research:

How to customize LLMs for low-frequency topics
How to improve the throughput of LLM application servers
How to use LLMs to create custom embedding models
Can GPT-4 and GPT-4V perform abstract reasoning like humans?

TechTalks

Is cosine similarity the right measure for embedding models?

Discussion about this post