概述
流式响应允许 AI 模型在生成内容时实时推送数据,无需等待完整响应。这可以显著提升用户体验,特别是在生成长文本时。
启用流式响应
在请求中设置 stream: true 即可启用流式响应:
curl
curl https://api.lingyuncx.com/v1/chat/completions \
-H "Authorization: Bearer sk-xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "写一首诗"}],
"stream": true
}'
SSE 数据格式
流式响应使用 SSE 格式,每条消息以 data: 开头:
text
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"春"},"index":0}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"眠"},"index":0}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"不"},"index":0}]}
data: [DONE]
客户端示例
Python 示例
Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.lingyuncx.com/v1",
api_key="sk-xxxxxxxx"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "写一首诗"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Node.js 示例
JavaScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.lingyuncx.com/v1',
apiKey: 'sk-xxxxxxxx'
});
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: '写一首诗' }],
stream: true
});
for await (const chunk of stream) {
if (chunk.choices[0].delta.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
浏览器 EventSource 示例
JavaScript
const response = await fetch('https://api.lingyuncx.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-xxxxxxxx',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [{ role: 'user', content: '写一首诗' }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.choices[0].delta.content) {
console.log(data.choices[0].delta.content);
}
}
}
}
注意事项
- Token 计数:流式响应的 Token 计数与非流式相同,不会产生额外费用
- 错误处理:流式响应中错误会在单独的
data:消息中返回 - 结束标志:流结束时返回
data: [DONE] - 超时设置:建议客户端设置合理的超时时间(推荐 60 秒)
💡 提示
流式响应特别适合聊天机器人、内容生成等需要实时反馈的场景。