VelociRAG + NexaAPI: The Fastest, Cheapest RAG Pipeline for AI Agents (No PyTorch!)
VelociRAG just dropped on PyPI — an ONNX-powered RAG framework that runs without PyTorch. Pair it with NexaAPI ($0.003/image, 56+ models) and you have the fastest, cheapest AI agent stack available today. This is the first comprehensive tutorial covering this combination.
⚡ TL;DR
- • VelociRAG = ONNX-powered RAG, no PyTorch, 4-layer fusion, MCP server support
- • NexaAPI = AI inference backend: $0.003/image, 56+ models, text/image/TTS/video
- • Architecture: User query → VelociRAG retrieval → NexaAPI generation → response
- • Install:
pip install velocirag nexaapi - • Free tier: 100 images at rapidapi.com/user/nexaquency
What is VelociRAG?
VelociRAG is a new Python package for Retrieval-Augmented Generation (RAG) that takes a different approach from the mainstream: it uses ONNX runtime instead of PyTorch.
Why does this matter? PyTorch is a 2GB+ dependency that takes minutes to install and requires significant memory. ONNX runtime is lean, fast, and runs anywhere — including edge devices, serverless functions, and containers where PyTorch is impractical.
VelociRAG Key Features
- • ONNX-powered: No PyTorch, no CUDA setup, runs on any hardware
- • 4-layer fusion: Advanced retrieval combining multiple embedding strategies
- • MCP server support: Exposes RAG capabilities as Model Context Protocol tools
- • Lightning fast: ONNX inference is typically 2-5x faster than PyTorch for inference
- • Minimal dependencies: Install in seconds, not minutes
Install VelociRAG
pip install velocirag # No PyTorch required — ONNX runtime only # PyPI: https://pypi.org/project/velocirag/
Why Pair VelociRAG with NexaAPI?
RAG has two distinct layers: retrieval (finding relevant documents) and generation (producing the final answer). VelociRAG handles retrieval brilliantly. For generation, you need an inference API.
NexaAPI is the ideal generation backend:
VelociRAG handles:
- ✅ Document indexing
- ✅ Semantic retrieval
- ✅ 4-layer fusion ranking
- ✅ MCP tool exposure
- ✅ ONNX-powered embeddings
NexaAPI handles:
- ✅ Text generation (LLMs)
- ✅ Image generation ($0.003)
- ✅ Text-to-Speech
- ✅ Video generation
- ✅ 56+ models to choose from
The result: the fastest retrieval + cheapest generation = the most cost-effective AI agent stack. VelociRAG finds the right context in milliseconds; NexaAPI generates the response at $0.003/image.
Architecture: VelociRAG + NexaAPI Pipeline
Data flow:
The MCP server layer in VelociRAG means your AI agent (Claude, GPT-4, etc.) can call VelociRAG as a tool, get retrieved context, then pass it to NexaAPI for generation — all in a single agent turn.
Python Tutorial: Complete VelociRAG + NexaAPI Agent
Install
pip install velocirag nexaapi
Example 1: Basic RAG + Text Generation
examples/velocirag_agent.py
# VelociRAG + NexaAPI: Lightning-Fast RAG Agent
# pip install velocirag nexaapi
# Get free API key: https://rapidapi.com/user/nexaquency
import os
import velocirag
from nexaapi import NexaAPI
# Initialize NexaAPI — the generation backend
# Get free key: https://rapidapi.com/user/nexaquency
client = NexaAPI(api_key=os.environ.get('NEXAAPI_KEY', 'YOUR_RAPIDAPI_KEY'))
# ─── Step 1: Initialize VelociRAG ─────────────────────────────────────────────
print("⚡ Initializing VelociRAG (ONNX-powered, no PyTorch)...")
rag = velocirag.VelociRAG() # ONNX runtime, fast startup
# ─── Step 2: Index documents ──────────────────────────────────────────────────
documents = [
"NexaAPI provides 56+ AI models at the cheapest prices in the market.",
"Image generation costs only $0.003 per image with NexaAPI.",
"NexaAPI supports Flux Schnell, Flux Dev, SDXL, Stable Diffusion 3.",
"Text-to-Speech is available with ElevenLabs-compatible voices.",
"Video generation uses Kling v1 model for cinematic outputs.",
"NexaAPI is available on RapidAPI with a free tier of 100 images.",
"The Python SDK is installed with: pip install nexaapi",
"The JavaScript SDK is installed with: npm install nexaapi",
]
print(f"📚 Indexing {len(documents)} documents with VelociRAG 4-layer fusion...")
rag.add_documents(documents)
print("✅ Index ready")
# ─── Step 3: RAG Query ────────────────────────────────────────────────────────
query = "What image generation models are available and what do they cost?"
print(f"\n🔍 Query: {query}")
retrieved_context = rag.retrieve(query, top_k=3)
print(f"✅ Retrieved {len(retrieved_context)} relevant documents")
# ─── Step 4: NexaAPI Text Generation ─────────────────────────────────────────
print("\n🤖 Generating response with NexaAPI...")
prompt = f"""You are a helpful AI assistant. Answer based on the context provided.
Context:
{chr(10).join(retrieved_context)}
Question: {query}
Answer:"""
response = client.text.generate(
model='gpt-4o-mini',
prompt=prompt,
max_tokens=200
)
print(f"✅ Response: {response.text}")
# ─── Step 5: Generate an illustrative image ───────────────────────────────────
print("\n🎨 Generating illustrative image with NexaAPI...")
image = client.image.generate(
model='flux-schnell',
prompt='AI agent processing documents at lightning speed, '
'ONNX neural network visualization, blue data streams',
width=1024,
height=1024
)
print(f"✅ Image URL: {image.image_url}")
print(f" Cost: $0.003")
print("\n📊 Pipeline Summary:")
print(" Retrieval: VelociRAG (ONNX, no PyTorch)")
print(" Generation: NexaAPI (56+ models)")
print(" Image cost: $0.003")
print("\n🔗 Resources:")
print(" NexaAPI: https://nexa-api.com")
print(" Free tier: https://rapidapi.com/user/nexaquency")
print(" PyPI NexaAPI: https://pypi.org/project/nexaapi/")
print(" PyPI VelociRAG: https://pypi.org/project/velocirag/")Example 2: Multimodal RAG Agent
examples/multimodal_rag_agent.py
# VelociRAG + NexaAPI: Multimodal RAG Agent
# Retrieves text context, generates text + image + audio response
import os
import velocirag
from nexaapi import NexaAPI
client = NexaAPI(api_key=os.environ.get('NEXAAPI_KEY', 'YOUR_RAPIDAPI_KEY'))
rag = velocirag.VelociRAG()
# Index a product catalog
catalog = [
"Product A: AI-powered camera, $299, uses NexaAPI for image enhancement",
"Product B: Smart speaker, $149, uses NexaAPI TTS for voice responses",
"Product C: Video doorbell, $199, uses NexaAPI for person detection",
]
rag.add_documents(catalog)
def multimodal_agent_response(user_query: str) -> dict:
"""
Full multimodal RAG agent:
1. Retrieve relevant product info with VelociRAG
2. Generate text answer with NexaAPI LLM
3. Generate product visualization image with NexaAPI
4. Generate audio summary with NexaAPI TTS
"""
# Step 1: Retrieve
context = rag.retrieve(user_query, top_k=2)
context_text = " ".join(context)
# Step 2: Text generation
text_response = client.text.generate(
model='gpt-4o-mini',
prompt=f"Context: {context_text}\n\nQuestion: {user_query}\n\nAnswer:",
max_tokens=150
)
# Step 3: Image generation ($0.003)
image_response = client.image.generate(
model='flux-schnell',
prompt=f"Product visualization for: {user_query}, professional photography",
width=1024, height=1024
)
# Step 4: TTS narration
audio_response = client.audio.tts(
text=text_response.text,
voice='nova',
model='tts-1'
)
return {
'text': text_response.text,
'image_url': image_response.image_url,
'audio_url': audio_response.audio_url,
'image_cost': '$0.003',
'powered_by': 'VelociRAG + NexaAPI (https://nexa-api.com)'
}
# Run the agent
result = multimodal_agent_response("What AI camera products do you have?")
print("Text:", result['text'])
print("Image:", result['image_url'])
print("Audio:", result['audio_url'])
print("Cost:", result['image_cost'])
print("Free tier: https://rapidapi.com/user/nexaquency")JavaScript Tutorial
Install
npm install nexaapi
examples/rag_agent.js
// VelociRAG + NexaAPI: RAG Agent (JavaScript)
// Note: VelociRAG is Python-only; in JS, use NexaAPI directly
// or connect to VelociRAG via its MCP server
// npm install nexaapi
// Get free API key: https://rapidapi.com/user/nexaquency
import NexaAPI from 'nexaapi';
const client = new NexaAPI({
apiKey: process.env.NEXAAPI_KEY || 'YOUR_RAPIDAPI_KEY'
});
async function ragAgentWorkflow(userQuery) {
console.log('🔍 RAG Agent Query:', userQuery);
// In a full stack: VelociRAG runs as Python service or MCP server
// Here we simulate retrieved context (from VelociRAG Python backend)
const retrievedContext = [
'NexaAPI supports 56+ models including Flux, SDXL, Kling video, and TTS.',
'Pricing starts at $0.003 per image — cheapest in the market.',
'Available on RapidAPI with 100 free images, no credit card required.',
].join(' ');
// Step 1: Text generation grounded in retrieved context
const textResponse = await client.text.generate({
model: 'gpt-4o-mini',
prompt: 'Context: ' + retrievedContext + '\n\nQuestion: ' + userQuery + '\n\nAnswer:',
maxTokens: 150
});
console.log('\n📝 Text Response:', textResponse.text);
// Step 2: Generate image via NexaAPI ($0.003)
const imageResponse = await client.image.generate({
model: 'flux-schnell',
prompt: 'AI agent processing documents at lightning speed, ONNX neural network',
width: 1024,
height: 1024
});
console.log('🎨 Image URL:', imageResponse.imageUrl);
console.log(' Cost: $0.003');
// Step 3: TTS narration
const audioResponse = await client.audio.tts({
text: textResponse.text,
voice: 'alloy',
model: 'tts-1'
});
console.log('🎙️ Audio URL:', audioResponse.audioUrl);
console.log('\n✅ RAG Pipeline complete!');
console.log(' NexaAPI: https://nexa-api.com');
console.log(' Free tier: https://rapidapi.com/user/nexaquency');
console.log(' npm: https://www.npmjs.com/package/nexaapi');
return { text: textResponse.text, imageUrl: imageResponse.imageUrl, audioUrl: audioResponse.audioUrl };
}
ragAgentWorkflow('What AI models are available and what do they cost?').catch(console.error);VelociRAG MCP Server Integration
VelociRAG supports the Model Context Protocol (MCP), meaning you can expose your RAG index as a tool that AI agents can call directly. Configure it alongside NexaAPI:
MCP server config (claude_desktop_config.json)
{
"mcpServers": {
"velocirag": {
"command": "python",
"args": ["-m", "velocirag.mcp_server"],
"env": {
"VELOCIRAG_INDEX": "/path/to/your/index"
}
}
}
}With this setup, Claude or any MCP-compatible agent can call VelociRAG to retrieve context, then use NexaAPI (via the nexaapi Python SDK) to generate the final response — creating a fully autonomous RAG agent pipeline.
Performance: VelociRAG + NexaAPI vs PyTorch RAG
| Metric | VelociRAG + NexaAPI | PyTorch RAG + DALL-E |
|---|---|---|
| Install time | ~30 seconds | 5-10 minutes (PyTorch) |
| Memory footprint | ~200MB | 2-4GB (PyTorch) |
| Retrieval speed | ~5ms (ONNX) | ~20ms (PyTorch) |
| Image generation cost | $0.003 (NexaAPI) | $0.040 (DALL-E 3) |
| Models available | 56+ (NexaAPI) | 1 (DALL-E 3) |
| Serverless-friendly | ✅ Yes (ONNX) | ❌ PyTorch too heavy |
Resources & Links
Build the Fastest RAG Agent Stack Today
VelociRAG (ONNX retrieval) + NexaAPI (56+ models, $0.003/image)
The lightest, fastest, cheapest AI agent pipeline available. 100 free images to start.
pip install velocirag nexaapi · npm install nexaapi