By compressing retrieved documents into efficient embeddings, REFRAG slashes latency and memory costs without modifying the LLM architecture or response quality.
REFRAG: Meta’s framework delivers a 30x speed…
By compressing retrieved documents into efficient embeddings, REFRAG slashes latency and memory costs without modifying the LLM architecture or response quality.