API Reference
This document specifies the Agentic FinSearch OpenAI-compatible REST API. The API is synchronous (no streaming). All request and response bodies are JSON.
—
Connection
Base URL
The API is served by a Django backend on port 8000.
Production (Fedora droplet at 134.122.1.153, IPv4 only):
https://agenticfinsearch.org:8000
Local development:
http://localhost:8000
All endpoint paths below are relative to this base URL.
Authentication
The API uses Bearer token authentication.
Authorization: Bearer <FINGPT_API_KEY>
The API key is set via the
FINGPT_API_KEYenvironment variable on the server.If
FINGPT_API_KEYis not set, authentication is disabled (development mode) and all requests are accepted.When authentication is enabled, every request to every endpoint must include the
Authorizationheader.
Error responses (401):
{
"error": {
"message": "Missing Authorization header. Use: Authorization: Bearer <api_key>",
"type": "authentication_error"
}
}
{
"error": {
"message": "Invalid API key",
"type": "authentication_error"
}
}
Rate Limiting
Default: 600 requests per hour per client (configurable via API_RATE_LIMIT env var).
Format: <count>/<period> where period is s (second), m (minute), h (hour), or d (day).
CORS
CORS restrictions only apply to browser-based requests. HTTP clients (curl, requests, httpx, Postman) are unaffected.
—
Endpoints
Health Check
Check if the backend is running. Does not require authentication.
Method |
|
Path |
|
Auth |
Not required |
Response (200):
{
"status": "healthy",
"service": "fingpt-backend",
"timestamp": "2026-02-22T12:00:00.000000",
"version": "0.13.3",
"using_unified_context": true
}
Example:
curl https://agenticfinsearch.org:8000/health/
—
List Models
Returns all available models in OpenAI-compatible format.
Method |
|
Path |
|
Auth |
Required (when |
Response (200):
{
"object": "list",
"data": [
{
"id": "FinGPT",
"object": "model",
"created": 1740000000,
"owned_by": "google",
"permission": [],
"root": "FinGPT",
"parent": null
},
{
"id": "FinGPT-Light",
"object": "model",
"created": 1740000000,
"owned_by": "openai",
"permission": [],
"root": "FinGPT-Light",
"parent": null
},
{
"id": "Buffet-Agent",
"object": "model",
"created": 1740000000,
"owned_by": "buffet",
"permission": [],
"root": "Buffet-Agent",
"parent": null
}
]
}
Response fields:
Field |
Type |
Description |
|---|---|---|
|
string |
Always |
|
array |
Array of model objects. |
|
string |
Model identifier. Use this value in the |
|
string |
Provider name: |
Example:
curl -H "Authorization: Bearer $API_KEY" \
https://agenticfinsearch.org:8000/v1/models
Error responses:
401: Authentication error (see Authentication).405: Wrong HTTP method (must beGET).
—
Chat Completions
Generate a chat completion. This is the primary endpoint for interacting with the agent.
Method |
|
Path |
|
Auth |
Required (when |
Content-Type |
|
Request Body
Field |
Type |
Required |
Description |
|---|---|---|---|
|
array |
Yes |
Array of message objects (see Message Format below). Must contain at least one message. The last message should be the user’s current question. |
|
string |
Yes |
Agent mode. One of: |
|
string |
No |
Model ID from |
|
string |
No |
A URL to scrape and inject as page context before generating the response. Used for site-specific analysis (e.g., analyzing a Yahoo Finance stock page). |
|
array |
No |
List of domain strings to scope research to (research mode only). Bare domains like |
|
array |
No |
List of full URLs to prioritize in research (research mode only). |
|
string |
No |
IANA timezone string (e.g., |
|
string |
No |
ISO 8601 timestamp of the user’s current time (e.g., |
|
string |
No |
An opaque user identifier. When provided, the session ID is derived from it ( |
Message Format
Each element of the messages array is an object:
Field |
Type |
Description |
|---|---|---|
|
string |
One of |
|
string |
The message text. |
The API processes messages in order:
systemmessages set the system prompt (last one wins).userandassistantmessages populate conversation history.The last message in the array is treated as the current prompt and must be a
usermessage for the agent to generate a response.
Modes
Mode |
Behavior |
|---|---|
|
Agentic mode. The agent uses MCP tools (SEC-EDGAR, Yahoo Finance) to gather data before responding. Best for specific financial questions. |
|
Deep research mode. The agent decomposes the question into sub-queries, performs parallel web searches, synthesizes a comprehensive answer. Best for broad research questions. |
|
Direct mode. The agent responds using its training data and any injected page context ( |
Response Body
The response follows the OpenAI chat completion format with Agentic FinSearch extensions.
{
"id": "chatcmpl-a1b2c3d4e5f6...",
"object": "chat.completion",
"created": 1740000000,
"model": "FinGPT",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The agent's response text..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 200,
"total_tokens": 350
},
"sources": []
}
Response fields:
Field |
Type |
Description |
|---|---|---|
|
string |
Unique completion ID, prefixed with |
|
string |
Always |
|
integer |
Unix timestamp of when the response was generated. |
|
string |
The model ID used. |
|
array |
Always contains exactly one choice (index 0). |
|
string |
Always |
|
string |
The generated response text (Markdown-formatted). |
|
string |
Always |
|
integer |
Approximate prompt token count from the context manager. |
|
integer |
Approximate completion tokens ( |
|
integer |
Sum of |
|
array |
Agentic FinSearch extension. List of source objects. Structure varies by mode (see below). |
Sources Format
The sources array structure depends on the mode used.
Thinking mode sources (MCP tool calls):
[
{
"type": "tool",
"tool_name": "get_stock_info",
"symbol": "AAPL",
"call_id": "call_abc123"
}
]
Research mode sources (web search results):
[
{
"url": "https://reuters.com/markets/article-xyz",
"title": "Reuters Article Title"
}
]
Normal mode: sources is typically an empty array [].
Error Responses
All errors follow this format:
{
"error": {
"message": "Human-readable error description",
"type": "error_type_string"
}
}
Code |
Type |
Cause |
|---|---|---|
400 |
|
Missing |
401 |
|
Missing/invalid |
404 |
|
Model ID does not exist (use |
405 |
(plain) |
Wrong HTTP method (e.g., |
500 |
|
Internal error. The |
—
Available Models
Model ID |
Provider |
Underlying Model |
Description |
|---|---|---|---|
|
|
Default model. 1M token context. No streaming. |
|
|
openai |
|
Faster, lighter. 128k token context. |
|
buffet |
Custom (Hugging Face endpoint) |
Fine-tuned financial model. |
All models support both thinking (MCP) and research (deep search) modes.
—
Usage Examples
All examples use curl. Replace agenticfinsearch.org with the droplet IP/domain and $API_KEY with the actual key.
Health Check
curl https://agenticfinsearch.org:8000/health/
List Models
curl -H "Authorization: Bearer $API_KEY" \
https://agenticfinsearch.org:8000/v1/models
Thinking Mode (MCP Tools)
Ask a specific financial question. The agent uses SEC-EDGAR and Yahoo Finance MCP tools to fetch data.
curl -X POST https://agenticfinsearch.org:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "FinGPT",
"mode": "thinking",
"messages": [
{"role": "user", "content": "What is the current price and P/E ratio of AAPL?"}
]
}'
Research Mode (Deep Search)
Ask a broad research question. The agent searches the web and synthesizes an answer.
curl -X POST https://agenticfinsearch.org:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "FinGPT",
"mode": "research",
"messages": [
{"role": "user", "content": "What are the key risks facing the US banking sector in 2026?"}
],
"search_domains": ["reuters.com", "bloomberg.com", "wsj.com"],
"preferred_links": ["https://www.federalreserve.gov"]
}'
Research Mode with Domain Scoping
curl -X POST https://agenticfinsearch.org:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "FinGPT-Light",
"mode": "research",
"messages": [
{"role": "user", "content": "Summarize recent SEC enforcement actions in crypto."}
],
"search_domains": ["sec.gov"],
"user_timezone": "America/New_York",
"user_time": "2026-02-22T10:00:00-05:00"
}'
With URL Context (Page Analysis)
Inject a page’s content before asking a question about it.
curl -X POST https://agenticfinsearch.org:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "FinGPT",
"mode": "thinking",
"url": "https://finance.yahoo.com/quote/MSFT/",
"messages": [
{"role": "user", "content": "Analyze this stock page and summarize the key metrics."}
]
}'
Multi-Turn Conversation
Pass full conversation history. The API is stateless — include all prior turns each time.
curl -X POST https://agenticfinsearch.org:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "FinGPT",
"mode": "thinking",
"messages": [
{"role": "user", "content": "What is AAPL trading at?"},
{"role": "assistant", "content": "Apple (AAPL) is currently trading at $195.50."},
{"role": "user", "content": "How does that compare to its 52-week high?"}
]
}'
Normal Mode (No Tools / No Search)
curl -X POST https://agenticfinsearch.org:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "FinGPT",
"mode": "normal",
"messages": [
{"role": "user", "content": "Explain what a P/E ratio is."}
]
}'
—
Python Benchmarking Quick Start
Below is a complete, copy-paste-ready Python script for benchmarking the API. It tests all three modes and measures response time.
"""Agentic FinSearch API Benchmark Script."""
import requests
import time
import json
BASE_URL = "https://agenticfinsearch.org:8000"
API_KEY = "<YOUR_API_KEY>" # omit Authorization header if auth is disabled
HEADERS = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
}
def call_completions(mode: str, question: str, model: str = "FinGPT", **kwargs) -> dict:
"""Send a chat completion request and return (response_dict, elapsed_seconds)."""
payload = {
"model": model,
"mode": mode,
"messages": [{"role": "user", "content": question}],
**kwargs,
}
start = time.time()
resp = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers=HEADERS,
json=payload,
timeout=120,
)
elapsed = time.time() - start
resp.raise_for_status()
data = resp.json()
return data, elapsed
def test_health():
"""Verify the server is running."""
resp = requests.get(f"{BASE_URL}/health/", timeout=10)
assert resp.status_code == 200
data = resp.json()
assert data["status"] == "healthy"
print(f"[PASS] Health check: {data['version']}")
def test_models():
"""Verify the models endpoint returns expected models."""
resp = requests.get(f"{BASE_URL}/v1/models", headers=HEADERS, timeout=10)
assert resp.status_code == 200
data = resp.json()
model_ids = [m["id"] for m in data["data"]]
assert "FinGPT" in model_ids
assert "FinGPT-Light" in model_ids
print(f"[PASS] Models: {model_ids}")
def test_thinking_mode():
"""Benchmark thinking mode (MCP tools)."""
data, elapsed = call_completions(
mode="thinking",
question="What is the current price of AAPL?",
)
content = data["choices"][0]["message"]["content"]
sources = data["sources"]
print(f"[PASS] Thinking mode ({elapsed:.1f}s)")
print(f" Response length: {len(content)} chars")
print(f" Sources: {json.dumps(sources, indent=2)}")
assert len(content) > 0
return elapsed
def test_research_mode():
"""Benchmark research mode (deep search)."""
data, elapsed = call_completions(
mode="research",
question="What are analysts saying about NVIDIA earnings?",
search_domains=["reuters.com", "cnbc.com"],
)
content = data["choices"][0]["message"]["content"]
sources = data["sources"]
print(f"[PASS] Research mode ({elapsed:.1f}s)")
print(f" Response length: {len(content)} chars")
print(f" Sources: {len(sources)} URLs")
for s in sources[:3]:
print(f" - {s.get('url', s.get('title', 'N/A'))}")
assert len(content) > 0
return elapsed
def test_normal_mode():
"""Benchmark normal mode (no tools, no search)."""
data, elapsed = call_completions(
mode="normal",
question="Explain what a dividend yield is.",
)
content = data["choices"][0]["message"]["content"]
print(f"[PASS] Normal mode ({elapsed:.1f}s)")
print(f" Response length: {len(content)} chars")
assert len(content) > 0
return elapsed
def test_error_handling():
"""Verify the API returns proper errors for bad requests."""
# Missing mode
resp = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers=HEADERS,
json={"model": "FinGPT", "messages": [{"role": "user", "content": "test"}]},
timeout=30,
)
assert resp.status_code == 400
assert "mode is required" in resp.json()["error"]["message"]
# Invalid model
resp = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers=HEADERS,
json={
"model": "nonexistent",
"mode": "thinking",
"messages": [{"role": "user", "content": "test"}],
},
timeout=30,
)
assert resp.status_code == 404
# Empty messages
resp = requests.post(
f"{BASE_URL}/v1/chat/completions",
headers=HEADERS,
json={"model": "FinGPT", "mode": "thinking", "messages": []},
timeout=30,
)
assert resp.status_code == 400
print("[PASS] Error handling: all validation errors returned correctly")
if __name__ == "__main__":
print("=" * 60)
print("Agentic FinSearch API Benchmark")
print("=" * 60)
test_health()
test_models()
test_error_handling()
timings = {}
timings["thinking"] = test_thinking_mode()
timings["research"] = test_research_mode()
timings["normal"] = test_normal_mode()
print("\n" + "=" * 60)
print("Timing Summary")
print("=" * 60)
for mode, t in timings.items():
print(f" {mode:12s}: {t:.1f}s")
print(f" {'TOTAL':12s}: {sum(timings.values()):.1f}s")
—
Behavioral Notes
Statelessness
The API is fully stateless. Each request creates a fresh session context. To maintain conversation history, the client must send the full messages array with every request.
Response Times
Thinking mode: 5-30 seconds (depends on number of MCP tool calls).
Research mode: 15-90 seconds (depends on search depth, number of sub-queries).
Normal mode: 2-10 seconds.
Set timeout accordingly in your HTTP client (recommended: 120 seconds).
Token Usage
The usage field provides approximate token counts. prompt_tokens comes from the context manager’s internal counter. completion_tokens is estimated as len(response_text) // 4. These are useful for relative benchmarking but are not exact billing-grade counts.
URL Scraping
When a url is provided, the backend scrapes it using Playwright (headless browser). The scraped content is injected into the agent’s context before response generation. This adds 2-5 seconds to the response time.
Error Safety
The API never exposes internal error details (stack traces, file paths) to clients. All 500 errors return a generic message. Full error details are logged server-side only.