目录参考文档1 安装1.1 创建conda环境1.2 安装架构依赖1.2.1 transform架构1.2.2 vLLM架构1.3 安装加速器1.4 安装大模型1.4.1 安装ASR1.4.2 安装TTS1.4.3 下载结果2 基础demo2.1 ASR Demo2.2 TTS Demo3 windows安装torch-cuda环境3.1 查看当前显卡安装的cuda版本3.1.1 命令行方式3.1.2 界面方式3.2 卸载之前torch3.3 安装指定cuda版本的torch参考文档https://github.com/QwenLM/Qwen3-TTShttps://github.com/QwenLM/Qwen3-ASR/tree/mainQwen3-TTS全面开源支持超低延迟流式合成的多语言语音大模型-阿里云开发者社区1 安装1.1 创建conda环境conda create -n qwen3-asr python3.12 -y conda activate qwen3-asr1.2 安装架构依赖支持transform架构和vLLM架构1.2.1 transform架构asr:pip install -U qwen-asrtts:pip install -U qwen-tts1.2.2 vLLM架构asr:pip install -U qwen-asr[vllm]tts:pip install -U qwen-tts1.3 安装加速器flash-attn 在mac电脑上好像装不上就没安装pip install -U flash-attn --no-build-isolation限制并行数量安装适用于多cpu或内存低96G的本地设备MAX_JOBS4 pip install -U flash-attn --no-build-isolation1.4 安装大模型安装摩卡下载器pip install -U modelscope1.4.1 安装ASRmodelscope download --model Qwen/Qwen3-ASR-0.6B --local_dir ./Qwen3-ASR-0.6B1.4.2 安装TTS千问的TTS依赖SOX下载解压sox后并配置环境变量。安装SOXhttps://sourceforge.net/projects/sox/postdownloadmodelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen3-TTS-12Hz-1.7B-BaseTTS包含三种voicedesign使用文本描述语音音色信息customvoice预设的9种音色base基座可以克隆用户音色1.4.3 下载结果2 基础demo2.1 ASR Demomac电脑不支持cuda因此使用cpu运行import torch from qwen_asr import Qwen3ASRModel model Qwen3ASRModel.from_pretrained( ./Qwen3-ASR-0.6B, dtypetorch.bfloat16, device_mapcpu, # attn_implementationflash_attention_2, max_inference_batch_size32, max_new_tokens256, # forced_aligner./Qwen3-ASR-0.6B, # forced_aligner_kwargsdict( # dtypetorch.bfloat16, # device_mapcpu, # # attn_implementationflash_attention_2, # ), ) results model.transcribe( audio[ https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav, https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav, ], language[Chinese, English], # can also be set to None for automatic language detection # return_time_stampsTrue, ) for r in results: print(r.language, r.text) # print(r.language, r.text, r.time_stamps[0])2.2 TTS Demotts的内存占用率很高处理时间更长import torch import soundfile as sf from qwen_tts import Qwen3TTSModel ref_audio https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav ref_text Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you. model Qwen3TTSModel.from_pretrained( ./Qwen3-TTS-12Hz-0.6B-Base, device_mapcpu, dtypetorch.bfloat16, # attn_implementationflash_attention_2, ) wavs, sr model.generate_voice_clone( textI am solving the equation: x [-b ± √(b²-4ac)] / 2a? Nobody can — its a disaster (◍•͈⌔•͈◍), very sad!, languageEnglish, ref_audioref_audio, ref_textref_text, ) sf.write(output_voice_clone.wav, wavs[0], sr)3 windows安装torch-cuda环境cuda必须是N卡才可以3.1 查看当前显卡安装的cuda版本3.1.1 命令行方式nvidia-smi3.1.2 界面方式3.2 卸载之前torchpip uninstall torch torchvision torchaudio -y3.3 安装指定cuda版本的torch命令行最后面的版本号要和3.1查看的系统cuda版本相同我的cuda版本为12.9因此使用cu129pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129