Model deepseek-llm-7b-chat

namespace	model name	standby gpu	standby pageable	standby pinned memory	gpu count	vRam (MB)	cpu	memory (MB)	state	revision
deepseek-ai	deepseek-llm-7b-chat	Blob	Blob	Blob	1	14600	20.0	60000	Normal	226

Prompt

Sample Rest Call

Pods

tenant	namespace	pod name	state	require resource	allocated resource
public	deepseek-ai	public/deepseek-ai/deepseek-llm-7b-chat/226/1181	Standby	{'CPU': 20000, 'Mem': 60000, 'GPU': {'Type': 'Any', 'Count': 1, 'vRam': 14600}}	{'nodename': 'node2', 'CPU': 20000, 'Mem': 60000, 'GPUType': 'A4000', 'GPUs': {'vRam': 0, 'map': {}, 'slotSize': 0, 'totalSlotCnt': 0}, 'MaxContextPerGPU': 1}

Failures

tenant	namespace	model name	revision	id	exit info	state
public	deepseek-ai	deepseek-llm-7b-chat	226	751	None	log

Func

{
"image": "vllm/vllm-openai:v0.6.2",
"commands": [
"--model",
"/root/.cache/huggingface/git/deepseek-llm-7b-chat",
"--served-model-name",
"deepseek-ai/deepseek-llm-7b-chat",
"--disable-custom-all-reduce",
"--trust-remote-code",
"--enforce-eager",
"--gpu-memory-utilization",
" 0.99",
"--max-model-len",
"200"
],
"envs": [
[
"LD_LIBRARY_PATH",
"/usr/local/lib/python3.12/dist-packages/nvidia/cuda_nvrtc/lib/:$LD_LIBRARY_PATH"
]
],
"mounts": [
{
"hostpath": "/home/brad/cache",
"mountpath": "/root/.cache/huggingface"
}
],
"endpoint": {
"port": 8000,
"schema": "Http",
"probe": "/health"
},
"version": 226,
"entrypoint": [],
"resources": {
"CPU": 20000,
"Mem": 60000,
"GPU": {
"Type": "Any",
"Count": 1,
"vRam": 14600
}
},
"standby": {
"gpu": "Blob",
"pageable": "Blob",
"pinned": "Blob"
},
"probe": {
"port": 80,
"schema": "Http",
"probe": "/health"
},
"sample_query": {
"apiType": "openai",
"path": "v1/completions",
"prompt": "what is the integral of x^2 from 0 to 2?\nPlease reason step by step, and put your final answer within \\boxed{}.",
"body": {
"max_tokens": "80",
"model": "deepseek-ai/deepseek-llm-7b-chat",
"stream": "true",
"temperature": "0"
}
}
}

InferX AI Function Platform (Lambda Function for Inference)

-- Serve tens models in one box with ultra-fast (<2 sec) cold start (contact: support@inferx.net)

Model deepseek-llm-7b-chat

Image

Prompt

Sample Rest Call

Pods

Failures

Func