Tutorial•March 2026•12 min read

VelociRAG + NexaAPI: The Fastest, Cheapest RAG Pipeline for AI Agents (No PyTorch!)

VelociRAG just dropped on PyPI — an ONNX-powered RAG framework that runs without PyTorch. Pair it with NexaAPI ($0.003/image, 50+ models) and you have the fastest, cheapest AI agent stack available today. This is the first comprehensive tutorial covering this combination.

March 27, 2026•NexaAPI Team

⚡ TL;DR

• VelociRAG = ONNX-powered RAG, no PyTorch, 4-layer fusion, MCP server support
• NexaAPI = AI inference backend: $0.003/image, 50+ models, text/image/TTS/video
• Architecture: User query → VelociRAG retrieval → NexaAPI generation → response
• Install: pip install velocirag nexaapi
• Free tier: 100 images at rapidapi.com/user/nexaquency

What is VelociRAG?

VelociRAG is a new Python package for Retrieval-Augmented Generation (RAG) that takes a different approach from the mainstream: it uses ONNX runtime instead of PyTorch.

Why does this matter? PyTorch is a 2GB+ dependency that takes minutes to install and requires significant memory. ONNX runtime is lean, fast, and runs anywhere — including edge devices, serverless functions, and containers where PyTorch is impractical.

VelociRAG Key Features

• ONNX-powered: No PyTorch, no CUDA setup, runs on any hardware
• 4-layer fusion: Advanced retrieval combining multiple embedding strategies
• MCP server support: Exposes RAG capabilities as Model Context Protocol tools
• Lightning fast: ONNX inference is typically 2-5x faster than PyTorch for inference
• Minimal dependencies: Install in seconds, not minutes

Install VelociRAG

pip install velocirag
# No PyTorch required — ONNX runtime only
# PyPI: https://pypi.org/project/velocirag/

Why Pair VelociRAG with NexaAPI?

RAG has two distinct layers: retrieval (finding relevant documents) and generation (producing the final answer). VelociRAG handles retrieval brilliantly. For generation, you need an inference API.

NexaAPI is the ideal generation backend:

VelociRAG handles:

✅ Document indexing
✅ Semantic retrieval
✅ 4-layer fusion ranking
✅ MCP tool exposure
✅ ONNX-powered embeddings

NexaAPI handles:

✅ Text generation (LLMs)
✅ Image generation ($0.003)
✅ Text-to-Speech
✅ Video generation
✅ 50+ models to choose from

The result: the fastest retrieval + cheapest generation = the most cost-effective AI agent stack. VelociRAG finds the right context in milliseconds; NexaAPI generates the response at $0.003/image.

Architecture: VelociRAG + NexaAPI Pipeline

Data flow:

User Query

→

VelociRAG

ONNX retrieval

4-layer fusion

→

NexaAPI

LLM generation

Image/TTS/Video

→

Response

Text + Media

The MCP server layer in VelociRAG means your AI agent (Claude, GPT-4, etc.) can call VelociRAG as a tool, get retrieved context, then pass it to NexaAPI for generation — all in a single agent turn.

Python Tutorial: Complete VelociRAG + NexaAPI Agent

Install

pip install velocirag nexaapi

Example 1: Basic RAG + Text Generation

examples/velocirag_agent.py

# VelociRAG + NexaAPI: Lightning-Fast RAG Agent
# pip install velocirag nexaapi
# Get free API key: https://rapidapi.com/user/nexaquency

import os
import velocirag
from nexaapi import NexaAPI

# Initialize NexaAPI — the generation backend
# Get free key: https://rapidapi.com/user/nexaquency
client = NexaAPI(api_key=os.environ.get('NEXAAPI_KEY', 'YOUR_RAPIDAPI_KEY'))

# ─── Step 1: Initialize VelociRAG ─────────────────────────────────────────────
print("⚡ Initializing VelociRAG (ONNX-powered, no PyTorch)...")
rag = velocirag.VelociRAG()  # ONNX runtime, fast startup

# ─── Step 2: Index documents ──────────────────────────────────────────────────
documents = [
    "NexaAPI provides 50+ AI models at the cheapest prices in the market.",
    "Image generation costs only $0.003 per image with NexaAPI.",
    "NexaAPI supports Flux Schnell, Flux Dev, SDXL, Stable Diffusion 3.",
    "Text-to-Speech is available with ElevenLabs-compatible voices.",
    "Video generation uses Kling v1 model for cinematic outputs.",
    "NexaAPI is available on RapidAPI with a free tier of 100 images.",
    "The Python SDK is installed with: pip install nexaapi",
    "The JavaScript SDK is installed with: npm install nexaapi",
]

print(f"📚 Indexing {len(documents)} documents with VelociRAG 4-layer fusion...")
rag.add_documents(documents)
print("✅ Index ready")

# ─── Step 3: RAG Query ────────────────────────────────────────────────────────
query = "What image generation models are available and what do they cost?"
print(f"\n🔍 Query: {query}")

retrieved_context = rag.retrieve(query, top_k=3)
print(f"✅ Retrieved {len(retrieved_context)} relevant documents")

# ─── Step 4: NexaAPI Text Generation ─────────────────────────────────────────
print("\n🤖 Generating response with NexaAPI...")
prompt = f"""You are a helpful AI assistant. Answer based on the context provided.

Context:
{chr(10).join(retrieved_context)}

Question: {query}

Answer:"""

response = client.text.generate(
    model='gpt-4o-mini',
    prompt=prompt,
    max_tokens=200
)
print(f"✅ Response: {response.text}")

# ─── Step 5: Generate an illustrative image ───────────────────────────────────
print("\n🎨 Generating illustrative image with NexaAPI...")
image = client.image.generate(
    model='flux-schnell',
    prompt='AI agent processing documents at lightning speed, '
           'ONNX neural network visualization, blue data streams',
    width=1024,
    height=1024
)
print(f"✅ Image URL: {image.image_url}")
print(f"   Cost: $0.003")

print("\n📊 Pipeline Summary:")
print("   Retrieval: VelociRAG (ONNX, no PyTorch)")
print("   Generation: NexaAPI (50+ models)")
print("   Image cost: $0.003")
print("\n🔗 Resources:")
print("   NexaAPI: https://nexa-api.com")
print("   Free tier: https://rapidapi.com/user/nexaquency")
print("   PyPI NexaAPI: https://pypi.org/project/nexaapi/")
print("   PyPI VelociRAG: https://pypi.org/project/velocirag/")

Example 2: Multimodal RAG Agent

examples/multimodal_rag_agent.py

# VelociRAG + NexaAPI: Multimodal RAG Agent
# Retrieves text context, generates text + image + audio response

import os
import velocirag
from nexaapi import NexaAPI

client = NexaAPI(api_key=os.environ.get('NEXAAPI_KEY', 'YOUR_RAPIDAPI_KEY'))
rag = velocirag.VelociRAG()

# Index a product catalog
catalog = [
    "Product A: AI-powered camera, $299, uses NexaAPI for image enhancement",
    "Product B: Smart speaker, $149, uses NexaAPI TTS for voice responses",
    "Product C: Video doorbell, $199, uses NexaAPI for person detection",
]
rag.add_documents(catalog)

def multimodal_agent_response(user_query: str) -> dict:
    """
    Full multimodal RAG agent:
    1. Retrieve relevant product info with VelociRAG
    2. Generate text answer with NexaAPI LLM
    3. Generate product visualization image with NexaAPI
    4. Generate audio summary with NexaAPI TTS
    """
    # Step 1: Retrieve
    context = rag.retrieve(user_query, top_k=2)
    context_text = " ".join(context)
    
    # Step 2: Text generation
    text_response = client.text.generate(
        model='gpt-4o-mini',
        prompt=f"Context: {context_text}\n\nQuestion: {user_query}\n\nAnswer:",
        max_tokens=150
    )
    
    # Step 3: Image generation ($0.003)
    image_response = client.image.generate(
        model='flux-schnell',
        prompt=f"Product visualization for: {user_query}, professional photography",
        width=1024, height=1024
    )
    
    # Step 4: TTS narration
    audio_response = client.audio.tts(
        text=text_response.text,
        voice='nova',
        model='tts-1'
    )
    
    return {
        'text': text_response.text,
        'image_url': image_response.image_url,
        'audio_url': audio_response.audio_url,
        'image_cost': '$0.003',
        'powered_by': 'VelociRAG + NexaAPI (https://nexa-api.com)'
    }

# Run the agent
result = multimodal_agent_response("What AI camera products do you have?")
print("Text:", result['text'])
print("Image:", result['image_url'])
print("Audio:", result['audio_url'])
print("Cost:", result['image_cost'])
print("Free tier: https://rapidapi.com/user/nexaquency")

JavaScript Tutorial

Install

npm install nexaapi

examples/rag_agent.js

// VelociRAG + NexaAPI: RAG Agent (JavaScript)
// Note: VelociRAG is Python-only; in JS, use NexaAPI directly
// or connect to VelociRAG via its MCP server
// npm install nexaapi
// Get free API key: https://rapidapi.com/user/nexaquency

import NexaAPI from 'nexaapi';

const client = new NexaAPI({
  apiKey: process.env.NEXAAPI_KEY || 'YOUR_RAPIDAPI_KEY'
});

async function ragAgentWorkflow(userQuery) {
  console.log('🔍 RAG Agent Query:', userQuery);

  // In a full stack: VelociRAG runs as Python service or MCP server
  // Here we simulate retrieved context (from VelociRAG Python backend)
  const retrievedContext = [
    'NexaAPI supports 50+ models including Flux, SDXL, Kling video, and TTS.',
    'Pricing starts at $0.003 per image — cheapest in the market.',
    'Available on RapidAPI with 100 free images, no credit card required.',
  ].join(' ');

  // Step 1: Text generation grounded in retrieved context
  const textResponse = await client.text.generate({
    model: 'gpt-4o-mini',
    prompt: 'Context: ' + retrievedContext + '\n\nQuestion: ' + userQuery + '\n\nAnswer:',
    maxTokens: 150
  });
  console.log('\n📝 Text Response:', textResponse.text);

  // Step 2: Generate image via NexaAPI ($0.003)
  const imageResponse = await client.image.generate({
    model: 'flux-schnell',
    prompt: 'AI agent processing documents at lightning speed, ONNX neural network',
    width: 1024,
    height: 1024
  });
  console.log('🎨 Image URL:', imageResponse.imageUrl);
  console.log('   Cost: $0.003');

  // Step 3: TTS narration
  const audioResponse = await client.audio.tts({
    text: textResponse.text,
    voice: 'alloy',
    model: 'tts-1'
  });
  console.log('🎙️ Audio URL:', audioResponse.audioUrl);

  console.log('\n✅ RAG Pipeline complete!');
  console.log('   NexaAPI: https://nexa-api.com');
  console.log('   Free tier: https://rapidapi.com/user/nexaquency');
  console.log('   npm: https://www.npmjs.com/package/nexaapi');

  return { text: textResponse.text, imageUrl: imageResponse.imageUrl, audioUrl: audioResponse.audioUrl };
}

ragAgentWorkflow('What AI models are available and what do they cost?').catch(console.error);

VelociRAG MCP Server Integration

VelociRAG supports the Model Context Protocol (MCP), meaning you can expose your RAG index as a tool that AI agents can call directly. Configure it alongside NexaAPI:

MCP server config (claude_desktop_config.json)

{
  "mcpServers": {
    "velocirag": {
      "command": "python",
      "args": ["-m", "velocirag.mcp_server"],
      "env": {
        "VELOCIRAG_INDEX": "/path/to/your/index"
      }
    }
  }
}

With this setup, Claude or any MCP-compatible agent can call VelociRAG to retrieve context, then use NexaAPI (via the nexaapi Python SDK) to generate the final response — creating a fully autonomous RAG agent pipeline.

Performance: VelociRAG + NexaAPI vs PyTorch RAG

Metric	VelociRAG + NexaAPI	PyTorch RAG + DALL-E
Install time	~30 seconds	5-10 minutes (PyTorch)
Memory footprint	~200MB	2-4GB (PyTorch)
Retrieval speed	~5ms (ONNX)	~20ms (PyTorch)
Image generation cost	$0.003 (NexaAPI)	$0.040 (DALL-E 3)
Models available	50+ (NexaAPI)	1 (DALL-E 3)
Serverless-friendly	✅ Yes (ONNX)	❌ PyTorch too heavy

Resources & Links

Build the Fastest RAG Agent Stack Today

VelociRAG (ONNX retrieval) + NexaAPI (50+ models, $0.003/image)

The lightest, fastest, cheapest AI agent pipeline available. 100 free images to start.

Get Free API Key →NexaAPI Docs →

pip install velocirag nexaapi · npm install nexaapi