大语言模型系列(8): Qwen2.5-omini-3B 端侧部署推理教程
本项目基于QAI_AppBuilderhttps://github.com/qualcomm/qai-appbuilder模型下载地址 (包含对应的上下文二进制文件)https://www.aidevhome.com/?id51第一部分Windows 平台使用本部分介绍如何在 Windows 环境下配置并运行 Qwen2.5-omini-3B 模型。1.1资源下载与准备下载模型文件访问网站下载对应平台的模型文件Qwen2.5-omini-3B 骁龙 X Elite 平台 (8380) 模型下载Qwen2.5-omini-3B 骁龙 X2 Elite 平台(8480) 模型下载将下载模型放置QAI AppBuilder\samples\genie\python\models目录下。下载 Genie 服务程序前往 GitHub Releases 页面下载8380GenieAPIService_Stable_QAIRT_v73.zipReleases 下载页面。8480GenieAPIService_Stable_QAIRT_v81.zipReleases 下载页面。解压文件将下载的压缩包解压至项目代码目录ai-engine-direct-helper\samples下。1.2启动服务与运行示例操作步骤打开终端进入 samples 目录分别运行服务和客户端命令。# 1. 进入目录 cd ai-engine-direct-helper\samples # 2. 启动 GenieAPI 服务 (加载配置文件) GenieAPIService_Stable_QAIRT_v73\GenieAPIService.exe -c genie\python\models\qwen2.5_omini_8380-2.42\config.json -l 成功启动会有日志 [INFO] Allocated total size 119406600 across 5 buffers [W] load successfully! use second: 10.9317 [W] Model load successfully: qwen2.5_omini_8380-2.42 [W] GenieService::setupHttpServer start [W] GenieService::setupHttpServer end [A] [OK] Genie API Service IS Running. [A] [OK] Genie API Service - http://0.0.0.0:8910 # 3. 运行客户端进行测试 (确保当前目录下有 test.png 图片) GenieAPIClient.exe --prompt what is the image descript? --img test.png --stream --model qwen2.5_omini_8380-2.42注意:运行客户端命令前请确保当前目录下存在名为test.png的测试图片文件。第二部分Android 平台使用2.1 资源下载与安装下载模型文件与 Windows 平台一致请先下载对应平台的模型Qwen2.5-omini-3B 骁龙 8 至尊版平台 (8750) 模型下载Qwen2.5-omini-3B 第五代骁龙 8 至尊版平台 (8850) 模型下载将下载模型放置/sdcard/GenieModels/目录下。下载与安装 APK访问 GitHub Releases 页面下载GenieAPIService.apk并安装至您的 Android 设备Releases 下载页面。2.2 启动应用第三部分Python 调用指南无论是在 Windows 运行GenieAPIService.exe还是在 Android 启动GenieAPIService.apk服务启动成功后都会显示一个 IP 地址和端口例如127.0.0.1:8910或手机IP。我们可以使用 Python 通过 OpenAI 兼容接口调用该服务。3.1环境准备请确保已安装openai库。pip install openai3.2 Python 调用代码创建一个 Python 脚本并将以下代码复制进去。请注意根据实际情况修改 IP 地址。import argparse import base64 import os import requests from openai import OpenAI DEFAULT_ADDR 127.0.0.1:8910 DEFAULT_API_KEY 123 SYSTEM_PROMPTS { llm: You are a helpful assistant., vl: You are a helpful assistant., omini: ( You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group. You are helpful and honest. You can perceive all types of information, including but not limited to text, images, and audio. ), } def encode_file(file_input: str) - str: if file_input.startswith((http://, https://)): print(f Downloading: {file_input}) resp requests.get(file_input, timeout30) resp.raise_for_status() return base64.b64encode(resp.content).decode(utf-8) if not os.path.exists(file_input): raise FileNotFoundError(fFile not found: {file_input}) with open(file_input, rb) as f: return base64.b64encode(f.read()).decode(utf-8) def detect_mode(args) - str: if args.mode: return args.mode if args.audio: return omini if args.image: return vl return llm def build_messages_llm(prompt: str, system: str) - list: return [ {role: system, content: system}, {role: user, content: prompt}, ] def build_messages_vl(prompt: str, image_b64: str, system: str) - list: return [ {role: system, content: system}, { role: user, content: [ {type: text, text: prompt}, { type: image_url, image_url: { url: fdata:image/jpeg;base64,{image_b64} }, }, ], }, ] def build_messages_omini( prompt: str, image_b64: str | None, audio_b64: str | None, system: str, ) - list: content {question: prompt} if image_b64: content[image] image_b64 if audio_b64: content[audio] audio_b64 return [ {role: system, content: system}, {role: user, content: content}, ] def send_request(client: OpenAI, model: str, messages: list, stream: bool, extra_body: dict): if stream: response client.chat.completions.create( modelmodel, streamTrue, messagesmessages, extra_bodyextra_body, ) print(Response: , end) for chunk in response: if chunk.choices: delta chunk.choices[0].delta.content if delta is not None: print(delta, end, flushTrue) print() else: response client.chat.completions.create( modelmodel, messagesmessages, extra_bodyextra_body, ) if response.choices: print(Response:, response.choices[0].message.content) def main(): parser argparse.ArgumentParser( descriptionGenie API Client — supports LLM / VL / Omini modes, formatter_classargparse.RawDescriptionHelpFormatter, epilog Examples: # Pure text (LLM) python demo.py --prompt What is AI? # Image understanding (VL) python demo.py --prompt Describe this image --image photo.jpg # Image Audio (Omini) python demo.py --prompt What do you see and hear? --audio speech.wav # Force Omini mode for text-only (uses embedding pipeline) python demo.py --prompt Hello --mode omini # Specify model name python demo.py --model qwen2.5_omini_8380-2.42 --prompt Hi , ) parser.add_argument(--addr, typestr, defaultDEFAULT_ADDR, helpfServer address (default: {DEFAULT_ADDR})) parser.add_argument(--model, typestr, defaultqwen2.5_omini_8380-2.42, helpModel name to use) parser.add_argument(--mode, typestr, choices[llm, vl, omini], defaultNone, helpForce mode (auto-detect if not set)) parser.add_argument(--prompt, typestr, defaultHello, helpText prompt) parser.add_argument(--image, typestr, defaultNone, helpImage path or URL) parser.add_argument(--audio, typestr, defaultNone, helpAudio WAV path or URL (triggers Omini mode)) parser.add_argument(--stream, actionstore_true, helpEnable streaming output) parser.add_argument(--system, typestr, defaultNone, helpCustom system prompt (overrides default)) parser.add_argument(--temp, typefloat, default0.7) parser.add_argument(--top_k, typeint, default40) parser.add_argument(--top_p, typefloat, default0.9) parser.add_argument(--max_tokens, typeint, default2048) args parser.parse_args() mode detect_mode(args) system args.system or SYSTEM_PROMPTS[mode] print(f Genie API Client ) print(f Server : {args.addr}) print(f Model : {args.model}) print(f Mode : {mode.upper()}) if args.image: print(f Image : {args.image}) if args.audio: print(f Audio : {args.audio}) print(f Prompt : {args.prompt}) print() client OpenAI(base_urlfhttp://{args.addr}/v1, api_keyDEFAULT_API_KEY) extra_body { temp: args.temp, top_k: args.top_k, top_p: args.top_p, size: args.max_tokens, } try: image_b64 encode_file(args.image) if args.image else None audio_b64 encode_file(args.audio) if args.audio else None except Exception as e: print(fError loading file: {e}) return if mode llm: messages build_messages_llm(args.prompt, system) elif mode vl: if not image_b64: print(Error: VL mode requires --image) return messages build_messages_vl(args.prompt, image_b64, system) elif mode omini: messages build_messages_omini(args.prompt, image_b64, audio_b64, system) try: send_request(client, args.model, messages, args.stream, extra_body) except Exception as e: print(f\nRequest failed: {e}) if __name__ __main__: main()3.3 运行脚本在命令行中运行脚本指定图片路径和可选提示词# Pure text (LLM) python demo.py --prompt What is AI? # Image understanding (VL) python demo.py --prompt Describe this image --image photo.jpg # Image Audio (Omini) python demo.py --prompt What do you see and hear? --audio speech.wav # Specify model name python demo.py --model qwen2.5_omini_8380-2.42 --prompt Hi