基于vLLM-Ascend的MiniMax-M2.5模型Atlas 800I A3单机混部部署实践
作者昇腾实战派知识地图https://blog.csdn.net/Lumos_Lovegood/article/details/161601003背景概述本文档将介绍基于vLLM-Ascend的MiniMax-M2.5模型在Atlas 800I A3上的单机混部部署实践包括支持的特性、特性配置、环境信息以及性能测试典型case。基本信息软件版本设备信息组网形态总卡数数据格式0.18.0NPUAtlas 800I A3-560THBM 128GCPUKunpeng 92080核-2900MHz内存32根64G5200MHZOSOpenEuler 22.03 LTS-SP4Atlas 800I A3单机8W8A8C16服务化配置低时延/高吞吐nic_namexxxnic_namexxxlocal_ipxxxexportHCCL_OP_EXPANSION_MODEAIVexportHCCL_IF_IP$local_ipexportGLOO_SOCKET_IFNAME$nic_nameexportTP_SOCKET_IFNAME$nic_nameexportHCCL_SOCKET_IFNAME$nic_nameexportHCCL_BUFFSIZE512exportPYTORCH_NPU_ALLOC_CONFexpandable_segments:TrueexportVLLM_ASCEND_ENABLE_FUSED_MC21exportOMP_NUM_THREADS1echoperformance|tee/sys/devices/system/cpu/cpu*/cpufreq/scaling_governorsysctl-wvm.swappiness0sysctl-wkernel.numa_balancing0sysctlkernel.sched_migration_cost_ns50000exportLD_PRELOAD/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOADexportTASK_QUEUE_ENABLE1exportVLLM_ASCEND_ENABLE_FLASHCOMM11exportVLLM_ASCEND_BALANCE_SCHEDULING1exportVLLM_ASCEND_ENABLE_NZ vllm serve /mnt/share/weight/MiniMax-M2.5-w8a8-QuaRot\--served-model-nameminimax\--host0.0.0.0\--port8004\--tensor-parallel-size8\--data-parallel-size1\--enable-expert-parallel\--no-enable-prefix-caching\--async-scheduling\--max-num-seqs32\--max-model-len196608\--max-num-batched-tokens16384\--gpu-memory-utilization0.85\--trust-remote-code\--quantizationascend\--no-enable-prefix-caching\--compilation-config{cudagraph_mode: FULL_DECODE_ONLY}\--additional-config{enable_cpu_binding:true}\--speculative_config{method: eagle3, model: /mnt/share/weight/MiniMax-M2-Eagle3-1/, num_speculative_tokens: 3}典型测试用例平均输入平均输出并行策略上下文长度Prefix Cache命中率总请求数最大并发数请求频率(req/s)20482048MLADP1TP81966080512128020482048MLADP1TP8196608010025035001500MLADP1TP81966080512128035001500MLADP1TP81966080100260163841024MLADP1TP81966080120300163841024MLADP1TP81966080164032768512MLADP1TP81966080369032768512MLADP1TP81966080410测试命令参考aisbench官方测试指南。aisbench测试命令vllm-ascend社区官网特别声明以上配置均未开启Prefix Cache若实际生产环境需要使用该特性参考vLLM-Ascend社区参数指南开启–enable-prefix-cachingeagle 权重下载路径https://www.modelscope.cn/models/Eco-Tech/MiniMax-M2.7-eagle-model-short