How Apple plans to bring LLMs to iPhones and Macs

Dec 27, 2023

Although it has been late to the game, Apple has been making some interesting moves in the LLM space.

A recent research paper by researchers at Apple introduces a technique that can run LLMs on edge devices with memory constraints.

The technique uses several novel memory management and weight storage techniques to reduce the amount of DRAM it occupies. The technique stores the model in flash memory and dynamically loads part of it into DRAM.

However, throughput from flash memory is at least an order of magnitude slower than DRAM. To overcome this limit, the researchers introduce several techniques that reduce latency by up to a factor of 25.

This can be an important development for the future, especially as Apple is interested in bringing AI features directly to your phone and computer.

Read more about Apple’s LLM in flash technique on TechTalks.

For more on AI research:

How to run thousands of LoRA language models on one GPU
StreamingLLM gives language models unlimited context
Does your enterprise need a vector database for LLM search?
The truth about ChatGPT’s degrading capabilities

TechTalks

How Apple plans to bring LLMs to iPhones and Macs

Discussion about this post