大语言模型系列(8): Qwen2.5-omini-3B 端侧部署推理教程-尧图建网站

本项目基于QAI_AppBuilderhttps://github.com/qualcomm/qai-appbuilder模型下载地址 (包含对应的上下文二进制文件)https://www.aidevhome.com/?id51第一部分Windows 平台使用本部分介绍如何在 Windows 环境下配置并运行 Qwen2.5-omini-3B 模型。1.1资源下载与准备下载模型文件访问网站下载对应平台的模型文件Qwen2.5-omini-3B 骁龙 X Elite 平台 (8380) 模型下载Qwen2.5-omini-3B 骁龙 X2 Elite 平台(8480) 模型下载将下载模型放置QAI AppBuilder\samples\genie\python\models目录下。下载 Genie 服务程序前往 GitHub Releases 页面下载8380GenieAPIService_Stable_QAIRT_v73.zipReleases 下载页面。8480GenieAPIService_Stable_QAIRT_v81.zipReleases 下载页面。解压文件将下载的压缩包解压至项目代码目录ai-engine-direct-helper\samples下。1.2启动服务与运行示例操作步骤打开终端进入 samples 目录分别运行服务和客户端命令。# 1. 进入目录 cd ai-engine-direct-helper\samples # 2. 启动 GenieAPI 服务 (加载配置文件) GenieAPIService_Stable_QAIRT_v73\GenieAPIService.exe -c genie\python\models\qwen2.5_omini_8380-2.42\config.json -l 成功启动会有日志 [INFO] Allocated total size 119406600 across 5 buffers [W] load successfully! use second: 10.9317 [W] Model load successfully: qwen2.5_omini_8380-2.42 [W] GenieService::setupHttpServer start [W] GenieService::setupHttpServer end [A] [OK] Genie API Service IS Running. [A] [OK] Genie API Service - http://0.0.0.0:8910 # 3. 运行客户端进行测试 (确保当前目录下有 test.png 图片) GenieAPIClient.exe --prompt what is the image descript? --img test.png --stream --model qwen2.5_omini_8380-2.42注意:运行客户端命令前请确保当前目录下存在名为test.png的测试图片文件。第二部分Android 平台使用2.1 资源下载与安装下载模型文件与 Windows 平台一致请先下载对应平台的模型Qwen2.5-omini-3B 骁龙 8 至尊版平台 (8750) 模型下载Qwen2.5-omini-3B 第五代骁龙 8 至尊版平台 (8850) 模型下载将下载模型放置/sdcard/GenieModels/目录下。下载与安装 APK访问 GitHub Releases 页面下载GenieAPIService.apk并安装至您的 Android 设备Releases 下载页面。2.2 启动应用第三部分Python 调用指南无论是在 Windows 运行GenieAPIService.exe还是在 Android 启动GenieAPIService.apk服务启动成功后都会显示一个 IP 地址和端口例如127.0.0.1:8910或手机IP。我们可以使用 Python 通过 OpenAI 兼容接口调用该服务。3.1环境准备请确保已安装openai库。pip install openai3.2 Python 调用代码创建一个 Python 脚本并将以下代码复制进去。请注意根据实际情况修改 IP 地址。import argparse import base64 import os import requests from openai import OpenAI DEFAULT_ADDR 127.0.0.1:8910 DEFAULT_API_KEY 123 SYSTEM_PROMPTS { llm: You are a helpful assistant., vl: You are a helpful assistant., omini: ( You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group. You are helpful and honest. You can perceive all types of information, including but not limited to text, images, and audio. ), } def encode_file(file_input: str) - str: if file_input.startswith((http://, https://)): print(f Downloading: {file_input}) resp requests.get(file_input, timeout30) resp.raise_for_status() return base64.b64encode(resp.content).decode(utf-8) if not os.path.exists(file_input): raise FileNotFoundError(fFile not found: {file_input}) with open(file_input, rb) as f: return base64.b64encode(f.read()).decode(utf-8) def detect_mode(args) - str: if args.mode: return args.mode if args.audio: return omini if args.image: return vl return llm def build_messages_llm(prompt: str, system: str) - list: return [ {role: system, content: system}, {role: user, content: prompt}, ] def build_messages_vl(prompt: str, image_b64: str, system: str) - list: return [ {role: system, content: system}, { role: user, content: [ {type: text, text: prompt}, { type: image_url, image_url: { url: fdata:image/jpeg;base64,{image_b64} }, }, ], }, ] def build_messages_omini( prompt: str, image_b64: str | None, audio_b64: str | None, system: str, ) - list: content {question: prompt} if image_b64: content[image] image_b64 if audio_b64: content[audio] audio_b64 return [ {role: system, content: system}, {role: user, content: content}, ] def send_request(client: OpenAI, model: str, messages: list, stream: bool, extra_body: dict): if stream: response client.chat.completions.create( modelmodel, streamTrue, messagesmessages, extra_bodyextra_body, ) print(Response: , end) for chunk in response: if chunk.choices: delta chunk.choices[0].delta.content if delta is not None: print(delta, end, flushTrue) print() else: response client.chat.completions.create( modelmodel, messagesmessages, extra_bodyextra_body, ) if response.choices: print(Response:, response.choices[0].message.content) def main(): parser argparse.ArgumentParser( descriptionGenie API Client — supports LLM / VL / Omini modes, formatter_classargparse.RawDescriptionHelpFormatter, epilog Examples: # Pure text (LLM) python demo.py --prompt What is AI? # Image understanding (VL) python demo.py --prompt Describe this image --image photo.jpg # Image Audio (Omini) python demo.py --prompt What do you see and hear? --audio speech.wav # Force Omini mode for text-only (uses embedding pipeline) python demo.py --prompt Hello --mode omini # Specify model name python demo.py --model qwen2.5_omini_8380-2.42 --prompt Hi , ) parser.add_argument(--addr, typestr, defaultDEFAULT_ADDR, helpfServer address (default: {DEFAULT_ADDR})) parser.add_argument(--model, typestr, defaultqwen2.5_omini_8380-2.42, helpModel name to use) parser.add_argument(--mode, typestr, choices[llm, vl, omini], defaultNone, helpForce mode (auto-detect if not set)) parser.add_argument(--prompt, typestr, defaultHello, helpText prompt) parser.add_argument(--image, typestr, defaultNone, helpImage path or URL) parser.add_argument(--audio, typestr, defaultNone, helpAudio WAV path or URL (triggers Omini mode)) parser.add_argument(--stream, actionstore_true, helpEnable streaming output) parser.add_argument(--system, typestr, defaultNone, helpCustom system prompt (overrides default)) parser.add_argument(--temp, typefloat, default0.7) parser.add_argument(--top_k, typeint, default40) parser.add_argument(--top_p, typefloat, default0.9) parser.add_argument(--max_tokens, typeint, default2048) args parser.parse_args() mode detect_mode(args) system args.system or SYSTEM_PROMPTS[mode] print(f Genie API Client ) print(f Server : {args.addr}) print(f Model : {args.model}) print(f Mode : {mode.upper()}) if args.image: print(f Image : {args.image}) if args.audio: print(f Audio : {args.audio}) print(f Prompt : {args.prompt}) print() client OpenAI(base_urlfhttp://{args.addr}/v1, api_keyDEFAULT_API_KEY) extra_body { temp: args.temp, top_k: args.top_k, top_p: args.top_p, size: args.max_tokens, } try: image_b64 encode_file(args.image) if args.image else None audio_b64 encode_file(args.audio) if args.audio else None except Exception as e: print(fError loading file: {e}) return if mode llm: messages build_messages_llm(args.prompt, system) elif mode vl: if not image_b64: print(Error: VL mode requires --image) return messages build_messages_vl(args.prompt, image_b64, system) elif mode omini: messages build_messages_omini(args.prompt, image_b64, audio_b64, system) try: send_request(client, args.model, messages, args.stream, extra_body) except Exception as e: print(f\nRequest failed: {e}) if __name__ __main__: main()3.3 运行脚本在命令行中运行脚本指定图片路径和可选提示词# Pure text (LLM) python demo.py --prompt What is AI? # Image understanding (VL) python demo.py --prompt Describe this image --image photo.jpg # Image Audio (Omini) python demo.py --prompt What do you see and hear? --audio speech.wav # Specify model name python demo.py --model qwen2.5_omini_8380-2.42 --prompt Hi

相关新闻

医疗预测建模：用LightGBM+SHAP构建可解释的临床决策工具

虚拟试衣系统VOGUE：参数化体态建模与版型拟合技术实践

「 简记往来」第六篇：微信登录与JWT鉴权完整实现

最新新闻

5个实用步骤：如何通过UniversalUnityDemosaics实现Unity游戏马赛克移除完整方案

Grafana 告警通知集成：钉钉、企业微信与邮件多渠道联动

告别连接烦恼：Windows一键安装苹果设备驱动终极指南

5分钟掌握Windows和Office永久激活：KMS智能激活终极指南

实战指南：R3nzSkin国服版实现英雄联盟全面视觉自定义

重视思维培养？蕃茄田适合哪些家庭的需求

日新闻

Selenium元素定位全解析：从八大方法到实战策略

BurpSuite Cluster Bomb模式深度避坑指南：从原理到实战的完整爆破策略

UnblockNeteaseMusic终极教程：3分钟解锁网易云音乐灰色歌曲的完整方案

周新闻

管理者的六个层次

华为OD机试2025C卷-座位调整[100分]（ Java _ Python3 _ C++ _ C语言 _ JsNode _ Go）实现100%通过率

CrabCode v1.0.7与v1.0.8 更新速览！

月新闻

FAE放射组学分析工具：医学影像特征探索的完整解决方案

基于Dify与DeepSeek构建私有知识库问答系统实战指南

餐饮老板必看：扫码点餐小程序3步搞定，别再让顾客干等了！

「简记往来」第六篇：微信登录与JWT鉴权完整实现