Launch your own LLM API endpoint with (almost) no coding
Background image: 123RF Foreground image generated with Forefront.AI
While there is a lot of excitement around open-source LLMs, integrating them into products comes with many challenges. There are many choices, each having its own specific hardware and integration requirements.
Ideally, like all product work, you should be able to iterate and explore different models as quickly as possible until you find the one that is a suitable starting point for your application. From there, you can start improving the model and the integration.
One solution that can help you get started quickly is launching your own LLM API endpoint. In my latest article, I introduce two frameworks that enable you to launch your own LLM servers with almost no coding required.
Using an web API endpoint is attractive for several reasons:
It decouples your application server from the LLM server, which requires specialized hardware
It you can easily integrate it with the app regardless of the language of your program and the LLM framework
If you’re already using an LLM API like OpenAI API, it will be easier to use your own API as a drop-in replacement for the existing code
With an API, you can change the underlying model and hardware without affecting the main application
Two frameworks that are worth exploring:
vLLM: Arguably the fastest LLM serving platform (up to 14x faster than Hugging Face Transformers). It supports many open-source model families. It can launch an API server with a single command. It emulates the OpenAI API, making it easier to switch between the two.
OpenLLM: Very easy to use. Supports many models. Also launches a web API server with a single command. It also supports adapters such as LoRA, which makes it easy to launch lightweight customized versions of your open-source LLMs.
Read the full article along with usage instructions on TechTalks.
Some goodies:
GPT-3: Building Innovative NLP Products Using Large Language Models by Sandra Kublik and Shubham Saboo provides an excellent overview of writing applications with LLM APIs like OpenAI API. While the book focuses on GPT-3, the same principles can be applied to all other commercial APIs.
Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems (multiple authors) provides an overview of the NLP landscape. This is really important book for readers of all levels, given the current excitement around LLMs. The main takeaway is that you do not need to use the most advanced technology to solve every problem. Start with the simplest solution first. (Read my review here)
Transformers for Natural Language Processing is an excellent introduction to the technology underlying LLMs. It provides a very accessible explanation of how transformers work and how you can use different transformer architectures (BERT, T5, GPT, etc.)
For more articles on LLMs: