SpreadsheetLLM unlocks the power of LLMs for spreadsheet tasks
Large language models are not designed for spreadsheets. Microsoft's SpreadsheetLLM makes spreadsheets digestible by LLMs.
LLMs have not been designed to handle spreadsheets. Spreadsheets can span thousands of rows and columns and have formatting and formula data that often exceed the token limits of even the largest LLMs.
SpreadsheetLLM, a new framework by Microsoft, enables LLMs to process spreadsheets better. SpreadsheetLLM uses mutliple encoding methods that compress and convert spreadsheet data into a format suitable for LLMs.
The simplest encoding method serializes spreadsheet data into Markdown format while preserving important information such as cell addresses and formats.
For larger spreadsheets, they propose SheetCompressor, a novel encoding framework composed of three modules that leverage the structure of spreadsheets to compress their data and reduce token consumption.
1- Identify the borders of table areas within a spreadsheet and remove rows and columns that are not close to these “structural anchors.”
2- Create a dictionary where each unique cell value is stored once along with the ranges of cells that contain it.
3- Use clustering algorithms to group cells with similar formats and represent them more efficiently and with fewer tokens.
In addition to these compression techniques, the researchers propose Chain-of-Spreadsheet (CoS), a technique that is inspired by Chain-of-Thought (CoT) prompting. CoS first selects the parts of a spreadsheet that are relevant to the input prompt before sending them to the LLM for reasoning. This results in fewer tokens and more focused knowledge, which reduces the probability of hallucinations.
Experiments show that SpreadsheetLLM results in 25x compression of prompts and increases the performance of LLMs on spreadsheet tasks considerably. One of the important advantages of compression is making spreadsheet processing available to LLMs with smaller context windows.
With so much valuable information in spreadsheets, being able to use them reliably in conversational interfaces can unlock many important applications.
Read more about SpreadsheetLLM on TechTalks.
Read the paper on Arxiv.