Serverless Inference
Hyperbolic’s Serverless Inference is an affordable AI inference platform designed for developers who need fast, serverless deployment of open-source models. It eliminates the complexity of managing GPU infrastructure, offering one-click deployment, low-latency inference, and full privacy control with zero data retention.
Overview
Hyperbolic supports 25+ open-source text, image, vision-language, and audio models, delivering inference at a fraction of the cost of major cloud providers while remaining fully API-compatible with OpenAI and other ecosystems.

Features
Zero Data Retention All requests are stateless—your data is never stored or reused.
Base vs. Instruct Models
Base models are versatile completion engines, ideal for open-ended tasks.
Instruct models are fine-tuned for direct commands and structured outputs.
Developer Tools & API Support
Multi-Language SDKs: Generate API requests using Python, TypeScript, and cURL.
API Playground: Test models before paying, with live adjustments for temperature, max tokens, and top-p.
REST API: Access models via a Chat Completion-compatible REST API, with streaming support for token-by-token responses in chat applications
Python & TypeScript: Fully OpenAI compatible, just swap your
api_key
andbase_url
to switch to Hyperbolic's models.Gradio & HF Spaces: Deploy and interact with models using Gradio, with one-click deployment to Hugging Face Spaces for easy prototyping and shareable web interfaces.
Last updated