langchain_huggingface.HuggingFaceEmbeddings 完整指南一、基础介绍HuggingFaceEmbeddings是 LangChain 对接 HuggingFace 本地 Embedding 模型的标准封装新版迁移到langchain_huggingface旧版langchain.embeddings.HuggingFaceEmbeddings已废弃。 作用加载本地 / Hub 文本嵌入模型如all-MiniLM-L6-v2提供 LangChain 统一嵌入接口无缝对接Chroma/FAISS/ 各类 TextSplitter。依赖安装bash运行pip install langchain-huggingface sentence-transformers torch二、核心参数python运行from langchain_huggingface import HuggingFaceEmbeddings embedding HuggingFaceEmbeddings( # 必选模型名sentence-transformers 系列 model_nameall-MiniLM-L6-v2, # 模型加载参数传给 SentenceTransformer model_kwargs{ device: cpu, # cuda / cpu / mps(mac) trust_remote_code: True }, # 编码推理参数 encode_kwargs{ normalize_embeddings: True, # all-MiniLM 默认归一化余弦相似度直接点积 batch_size: 32, show_progress_bar: False }, # 缓存模型本地路径 cache_folder./hf_model_cache )关键参数说明model_name支持所有 sentence-transformers 模型英文轻量all-MiniLM-L6-v2、all-MiniLM-L12-v2中文轻量BAAI/bge-small-zh-v1.5model_kwargs[device]Windows/Linux 有显卡cudaApple Silicon Macmps无显卡cpuencode_kwargs[normalize_embeddings]all-MiniLM-L6-v2 原生输出归一化向量开启后向量点积 余弦相似度计算更快。batch_size批量编码文档分块提升入库速度。三、核心 APILangChain 标准接口1. embed_query单条查询向量化用于检索python运行vec embedding.embed_query(What is RecursiveCharacterTextSplitter?) print(len(vec)) # all-MiniLM-L6-v2 → 3842. embed_documents批量文档分块向量化入库python运行chunks [ all-MiniLM-L6-v2 is a sentence embedding model, RecursiveCharacterTextSplitter is used for document chunking ] vecs embedding.embed_documents(chunks) print(len(vecs), len(vecs[0]))四、完整串联示例all-MiniLM-L6-v2 RecursiveCharacterTextSplitter Chromapython运行from langchain_huggingface import HuggingFaceEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.vectorstores import Chroma # 1. 初始化嵌入模型 embed HuggingFaceEmbeddings( model_nameall-MiniLM-L6-v2, model_kwargs{device: cpu}, encode_kwargs{normalize_embeddings: True} ) # 2. Token 计数函数适配MiniLM 512token上限 def count_tokens(text): return len(embed.client.tokenizer.encode(text)) # 3. 分块器 splitter RecursiveCharacterTextSplitter( chunk_size400, chunk_overlap80, length_functioncount_tokens, separators[\n\n, \n, . , ! , ? ] ) # 原始文档 raw_text all-MiniLM-L6-v2 is a lightweight sentence embedding model with 6 layers, 66M parameters, output 384 dim vector. Max input token length is 512, so chunking is required for long documents. RecursiveCharacterTextSplitter splits text by paragraph first to keep complete sentences. Chunk overlap prevents cross-paragraph semantic loss during retrieval. # 切分块 text_chunks splitter.split_text(raw_text) # 4. 存入向量库 db Chroma.from_texts( textstext_chunks, embeddingembed, persist_directory./chroma_minilm_db ) db.persist() # 5. 语义检索 query Why do we need document chunking for MiniLM? res db.similarity_search(query, k2) for doc in res: print(doc.page_content, \n---)五、embed.client直接拿到原生 SentenceTransformer 对象embedding.client底层就是sentence_transformers.SentenceTransformer可调用原生方法python运行model embed.client # 原生编码 vec model.encode(test text, normalize_embeddingsTrue) # 获取tokenizer tokenizer model.tokenizer六、常见配置优化1. GPU 加速cudapython运行model_kwargs{device: cuda}2. Apple Silicon MPSpython运行model_kwargs{device: mps}3. 中文模型替换BGEpython运行embed HuggingFaceEmbeddings( model_nameBAAI/bge-small-zh-v1.5, encode_kwargs{normalize_embeddings: True} ) # 中文分隔符配合分块器 separators[\n\n, \n, 。, , , ]七、和原生 SentenceTransformer 的区别SentenceTransformer底层模型库只负责编码向量HuggingFaceEmbeddingsLangChain 包装层统一接口适配所有向量库、Retriever、Chain无需手动处理批量、兼容 LangChain 完整 RAG 流水线。八、常见踩坑ImportError: cannot import name HuggingFaceEmbeddings旧代码from langchain.embeddings import HuggingFaceEmbeddings失效必须换langchain_huggingface。向量维度不匹配all-MiniLM-L6-v2 固定 384 维向量库不能混用不同维度模型。编码速度慢 CPU加大batch_size切换 cuda/mps。长文本超过 512token必须配合 RecursiveCharacterTextSplitter用模型 tokenizer 统计长度限制 chunk_size。相似度不准 开启normalize_embeddingsTrue并设置合理 chunk_overlap。九、极简封装模板可直接复制到项目python运行def get_minilm_embedding(): return HuggingFaceEmbeddings( model_nameall-MiniLM-L6-v2, model_kwargs{device: cpu}, encode_kwargs{ normalize_embeddings: True, batch_size: 16 } )