Skip to content

AI / LLM

AI/LLM streaming module for Atmosphere. Provides @AiEndpoint, @Prompt, StreamingSession, the AiSupport SPI for auto-detected AI framework adapters, and a built-in OpenAiCompatibleClient that works with Gemini, OpenAI, Ollama, and any OpenAI-compatible API.

<dependency>
<groupId>org.atmosphere</groupId>
<artifactId>atmosphere-ai</artifactId>
<version>LATEST</version> <!-- check Maven Central for latest -->
</dependency>

Atmosphere has two pluggable SPI layers. AsyncSupport adapts web containers — Jetty, Tomcat, Undertow. AiSupport adapts AI frameworks — Spring AI, LangChain4j, Google ADK, Embabel. Same design pattern, same discovery mechanism:

ConcernTransport layerAI layer
SPI interfaceAsyncSupportAiSupport
What it adaptsWeb containers (Jetty, Tomcat, Undertow)AI frameworks (Spring AI, LangChain4j, ADK, Embabel)
DiscoveryClasspath scanningServiceLoader
ResolutionBest available containerHighest priority() among isAvailable()
Initializationinit(ServletConfig)configure(LlmSettings)
Core methodservice(req, res)stream(AiRequest, StreamingSession)
FallbackBlockingIOCometSupportBuiltInAiSupport (OpenAI-compatible)
@AiEndpoint(path = "/ai/chat",
systemPrompt = "You are a helpful assistant",
conversationMemory = true)
public class MyChatBot {
@Prompt
public void onPrompt(String message, StreamingSession session) {
session.stream(message); // auto-detects Spring AI, LangChain4j, ADK, Embabel, or built-in
}
}

The @AiEndpoint annotation replaces the boilerplate of @ManagedService + @Ready + @Disconnect + @Message for AI streaming use cases. The @Prompt method runs on a virtual thread.

session.stream(message) auto-detects the best available AiSupport implementation via ServiceLoader — drop an adapter JAR on the classpath and it just works.

Classpath JARAuto-detected AiSupportPriority
atmosphere-ai (default)Built-in OpenAiCompatibleClient (Gemini, OpenAI, Ollama)0
atmosphere-spring-aiSpring AI ChatClient100
atmosphere-langchain4jLangChain4j StreamingChatLanguageModel100
atmosphere-adkGoogle ADK Runner100
atmosphere-embabelEmbabel AgentPlatform100

Enable multi-turn conversations with one annotation attribute:

@AiEndpoint(path = "/ai/chat",
systemPrompt = "You are a helpful assistant",
conversationMemory = true,
maxHistoryMessages = 20)
public class MyChat {
@Prompt
public void onPrompt(String message, StreamingSession session) {
session.stream(message);
}
}

When conversationMemory = true, the framework:

  1. Captures each user message and the streamed assistant response (via MemoryCapturingSession)
  2. Stores them as conversation turns per AtmosphereResource
  3. Injects the full history into every subsequent AiRequest
  4. Clears the history when the resource disconnects

The default implementation is InMemoryConversationMemory (capped at maxHistoryMessages, default 20). For external storage, implement the AiConversationMemory SPI:

public interface AiConversationMemory {
List<ChatMessage> getHistory(String conversationId);
void addMessage(String conversationId, ChatMessage message);
void clear(String conversationId);
int maxMessages();
}

@AiTool — Framework-Agnostic Tool Calling

Section titled “@AiTool — Framework-Agnostic Tool Calling”

Declare tools with @AiTool and they work with any AI backend — Spring AI, LangChain4j, Google ADK. No framework-specific annotations needed.

public class AssistantTools {
@AiTool(name = "get_weather",
description = "Returns a weather report for a city")
public String getWeather(
@Param(value = "city", description = "City name to get weather for")
String city) {
return weatherService.lookup(city);
}
@AiTool(name = "convert_temperature",
description = "Converts between Celsius and Fahrenheit")
public String convertTemperature(
@Param(value = "value", description = "Temperature value") double value,
@Param(value = "from_unit", description = "'C' or 'F'") String fromUnit) {
return "C".equalsIgnoreCase(fromUnit)
? String.format("%.1f°C = %.1f°F", value, value * 9.0 / 5.0 + 32)
: String.format("%.1f°F = %.1f°C", value, (value - 32) * 5.0 / 9.0);
}
}
@AiEndpoint(path = "/ai/chat",
systemPrompt = "You are a helpful assistant",
conversationMemory = true,
tools = AssistantTools.class)
public class MyChat {
@Prompt
public void onPrompt(String message, StreamingSession session) {
session.stream(message); // tools are automatically available to the LLM
}
}
@AiTool methods
↓ scan at startup
DefaultToolRegistry (global)
↓ selected per-endpoint via tools = {...}
AiRequest.withTools(tools)
↓ bridged to backend-native format
LangChain4jToolBridge / SpringAiToolBridge / AdkToolBridge
↓ LLM decides to call a tool
ToolExecutor.execute(args) → result fed back to LLM
StreamingSession → WebSocket → browser

The tool bridge layer converts @AiTool to the native format at runtime:

BackendBridge ClassNative Format
LangChain4jLangChain4jToolBridgeToolSpecification
Spring AISpringAiToolBridgeToolCallback
Google ADKAdkToolBridgeBaseTool
@AiTool (Atmosphere)@Tool (LangChain4j)FunctionCallback (Spring AI)
PortableAny backendLangChain4j onlySpring AI only
Parameter metadata@Param annotation@P annotationJSON Schema
RegistrationToolRegistry (global)Per-servicePer-ChatClient

To swap the AI backend, change only the Maven dependency — no tool code changes:

<!-- Use LangChain4j -->
<artifactId>atmosphere-langchain4j</artifactId>
<!-- Or Spring AI -->
<artifactId>atmosphere-spring-ai</artifactId>
<!-- Or Google ADK -->
<artifactId>atmosphere-adk</artifactId>

See the spring-boot-ai-tools sample.

Cross-cutting concerns (RAG, guardrails, logging) go through AiInterceptor, not subclassing:

@AiEndpoint(path = "/ai/chat", interceptors = {RagInterceptor.class, LoggingInterceptor.class})
public class MyChat { ... }
public class RagInterceptor implements AiInterceptor {
@Override
public AiRequest preProcess(AiRequest request, AtmosphereResource resource) {
String context = vectorStore.search(request.message());
return request.withMessage(context + "\n\n" + request.message());
}
}

The AI module includes filters and middleware that sit between the @Prompt method and the LLM:

ClassWhat it does
PiiRedactionFilterBuffers messages to sentence boundaries, redacts email/phone/SSN/CC
ContentSafetyFilterPluggable SafetyChecker SPI — block, redact, or pass
CostMeteringFilterPer-session/broadcaster message counting with budget enforcement
RoutingLlmClientRoute by content, model, cost, or latency rules
FanOutStreamingSessionConcurrent N-model streaming: AllResponses, FirstComplete, FastestStreamingTexts
StreamingTextBudgetManagerPer-user/org budgets with graceful degradation
AiResponseCacheInspectorCache control for AI messages in BroadcasterCache
AiResponseCacheListenerAggregate per-session events instead of per-message noise

RoutingLlmClient supports cost-based and latency-based routing rules:

var router = RoutingLlmClient.builder(defaultClient, "gemini-2.5-flash")
.route(RoutingRule.costBased(5.0, List.of(
new ModelOption(openaiClient, "gpt-4o", 0.01, 200, 10),
new ModelOption(geminiClient, "gemini-flash", 0.001, 50, 5))))
.route(RoutingRule.latencyBased(100, List.of(
new ModelOption(ollamaClient, "llama3.2", 0.0, 30, 3),
new ModelOption(openaiClient, "gpt-4o-mini", 0.005, 80, 7))))
.build();

You can bypass @AiEndpoint and use adapters directly:

Spring AI:

var session = StreamingSessions.start(resource);
springAiAdapter.stream(chatClient, prompt, session);

LangChain4j:

var session = StreamingSessions.start(resource);
model.chat(ChatMessage.userMessage(prompt),
new AtmosphereStreamingResponseHandler(session));

Google ADK:

var session = StreamingSessions.start(resource);
adkAdapter.stream(new AdkRequest(runner, userId, sessionId, prompt), session);

Embabel:

val session = StreamingSessions.start(resource)
embabelAdapter.stream(AgentRequest("assistant") { channel ->
agentPlatform.run(prompt, channel)
}, session)
import { useStreaming } from 'atmosphere.js/react';
function AiChat() {
const { fullText, isStreaming, stats, routing, send } = useStreaming({
request: { url: '/ai/chat', transport: 'websocket' },
});
return (
<div>
<button onClick={() => send('Explain WebSockets')} disabled={isStreaming}>
Ask
</button>
<p>{fullText}</p>
{stats && <small>{stats.totalStreamingTexts} streaming texts</small>}
{routing.model && <small>Model: {routing.model}</small>}
</div>
);
}
var client = AiConfig.get().client();
var assistant = new LlmRoomMember("assistant", client, "gpt-5",
"You are a helpful coding assistant");
Room room = rooms.room("dev-chat");
room.joinVirtual(assistant);
// Now when any user sends a message, the LLM responds in the same room

The client receives JSON messages over WebSocket/SSE:

  • {"type":"streaming-text","content":"Hello"} — a single streaming text
  • {"type":"progress","message":"Thinking..."} — status update
  • {"type":"complete"} — stream finished
  • {"type":"error","message":"..."} — stream failed

Configure the built-in client with environment variables:

VariableDescriptionDefault
LLM_MODEremote (cloud) or local (Ollama)remote
LLM_MODELgemini-2.5-flash, gpt-5, o3-mini, llama3.2, …gemini-2.5-flash
LLM_API_KEYAPI key (or GEMINI_API_KEY for Gemini)
LLM_BASE_URLOverride endpoint (auto-detected from model name)auto
ClassDescription
@AiEndpointMarks a class as an AI chat endpoint with a path, system prompt, and interceptors
@PromptMarks the method that handles user messages
@AiToolMarks a method as an AI-callable tool (framework-agnostic)
@ParamDescribes a tool parameter’s name, description, and required flag
AiSupportSPI for AI framework backends (ServiceLoader-discovered)
AiRequestFramework-agnostic request record (message, systemPrompt, model, hints)
AiInterceptorPre/post processing hooks for RAG, guardrails, logging
AiConversationMemorySPI for conversation history storage
StreamingSessionDelivers streaming texts, progress updates, and metadata to the client
StreamingSessionsFactory for creating StreamingSession instances
OpenAiCompatibleClientBuilt-in HTTP client for OpenAI-compatible APIs
RoutingLlmClientRoutes prompts to different LLM backends based on rules
ToolRegistryGlobal registry for @AiTool definitions
ModelRouterSPI for intelligent model routing and failover
AiGuardrailSPI for pre/post-LLM safety inspection
AiMetricsSPI for AI observability (streaming texts, latency, cost)
ConversationPersistenceSPI for durable conversation storage (Redis, SQLite)
RetryPolicyExponential backoff with circuit-breaker semantics