Real-time status of vLLM worker nodes
vllm_1_0:8000
vllm_2_0:8000
vllm_3_0:8000
vllm_4_0:8000
This cluster uses the Least Connections strategy provided by HAProxy.
/health), it is automatically removed from
rotation.http://jerry.kaist.ac.kr/v1
from openai import OpenAI
client = OpenAI(
base_url="http://jerry.kaist.ac.kr/v1",
api_key="YOUR_KEY"
)
response = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "Hi"}]
)
print(response.choices[0].message.content)
curl http://jerry.kaist.ac.kr/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-d '{
"model": "openai/gpt-oss-120b",
"messages": [{"role": "user", "content": "Hi"}]
}'