Hugging Face x LangChain : A new partner package in LangChain

Published May 14, 2024

Update on GitHub

Upvote

151

guest

langchain-ai

We are thrilled to announce the launch of langchain_huggingface, a partner package in LangChain jointly maintained by Hugging Face and LangChain. This new Python package is designed to bring the power of the latest development of Hugging Face into LangChain and keep it up to date.

From the community, for the community

All Hugging Face-related classes in LangChain were coded by the community, and while we thrived on this, over time, some of them became deprecated because of the lack of an insider’s perspective.

By becoming a partner package, we aim to reduce the time it takes to bring new features available in the Hugging Face ecosystem to LangChain's users.

langchain-huggingface integrates seamlessly with LangChain, providing an efficient and effective way to utilize Hugging Face models within the LangChain ecosystem. This partnership is not just about sharing technology but also about a joint commitment to maintain and continually improve this integration.

Getting Started

Getting started with langchain-huggingface is straightforward. Here’s how you can install and begin using the package:

pip install langchain-huggingface

Now that the package is installed, let’s have a tour of what’s inside !

The LLMs

HuggingFacePipeline

Among transformers, the Pipeline is the most versatile tool in the Hugging Face toolbox. LangChain being designed primarily to address RAG and Agent use cases, the scope of the pipeline here is reduced to the following text-centric tasks: “text-generation", “text2text-generation", “summarization”, “translation”.

Models can be loaded directly with the from_model_id method:

from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 100,
        "top_k": 50,
        "temperature": 0.1,
    },
)
llm.invoke("Hugging Face is")

Or you can also define the pipeline yourself before passing it to the class:

from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline

model_id = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    #attn_implementation="flash_attention_2", # if you have an ampere GPU
)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100, top_k=50, temperature=0.1)
llm = HuggingFacePipeline(pipeline=pipe)
llm.invoke("Hugging Face is")

When using this class, the model will be loaded in cache and use your computer’s hardware; thus, you may be limited by the available resources on your computer.

HuggingFaceEndpoint

There are also two ways to use this class. You can specify the model with the repo_id parameter. Those endpoints use the serverless API, which is particularly beneficial to people using pro accounts or enterprise hub. Still, regular users can already have access to a fair amount of request by connecting with their HF token in the environment where they are executing the code.

from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    task="text-generation",
    max_new_tokens=100,
    do_sample=False,
)
llm.invoke("Hugging Face is")

llm = HuggingFaceEndpoint(
    endpoint_url="<endpoint_url>",
    task="text-generation",
    max_new_tokens=1024,
    do_sample=False,
)
llm.invoke("Hugging Face is")

Under the hood, this class uses the InferenceClient to be able to serve a wide variety of use-case, serverless API to deployed TGI instances.

ChatHuggingFace

Every model has its own special tokens with which it works best. Without those tokens added to your prompt, your model will greatly underperform

When going from a list of messages to a completion prompt, there is an attribute that exists in most LLM tokenizers called chat_template.

To learn more about chat_template in the different models, visit this space I made!

This class is wrapper around the other LLMs. It takes as input a list of messages an then creates the correct completion prompt by using the tokenizer.apply_chat_template method.

from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url="<endpoint_url>",
    task="text-generation",
    max_new_tokens=1024,
    do_sample=False,
)
llm_engine_hf = ChatHuggingFace(llm=llm)
llm_engine_hf.invoke("Hugging Face is")

The code above is equivalent to :

# with mistralai/Mistral-7B-Instruct-v0.2
llm.invoke("<s>[INST] Hugging Face is [/INST]")

# with meta-llama/Meta-Llama-3-8B-Instruct
llm.invoke("""<|begin_of_text|><|start_header_id|>user<|end_header_id|>Hugging Face is<|eot_id|><|start_header_id|>assistant<|end_header_id|>""")

The Embeddings

Hugging Face is filled with very powerful embedding models than you can directly leverage in your pipeline.

First choose your model. One good resource for choosing an embedding model is the MTEB leaderboard.

HuggingFaceEmbeddings

This class uses sentence-transformers embeddings. It computes the embedding locally, hence using your computer resources.

from langchain_huggingface.embeddings import HuggingFaceEmbeddings

model_name = "mixedbread-ai/mxbai-embed-large-v1"
hf_embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)

HuggingFaceEndpointEmbeddings

HuggingFaceEndpointEmbeddings is very similar to what HuggingFaceEndpoint does for the LLM, in the sense that it also uses the InferenceClient under the hood to compute the embeddings. It can be used with models on the hub, and TEI instances whether they are deployed locally or online.

from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model= "mixedbread-ai/mxbai-embed-large-v1",
    task="feature-extraction",
    huggingfacehub_api_token="<HF_TOKEN>",
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)

Conclusion

We are committed to making langchain-huggingface better by the day. We will be actively monitoring feedback and issues and working to address them as quickly as possible. We will also be adding new features and functionality and expanding the package to support an even wider range of the community's use cases. We strongly encourage you to try this package and to give your opinion, as it will pave the way for the package's future.

Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders

By July 16, 2025 • 53

SmolLM3: smol, multilingual, long-context reasoner

By July 8, 2025 • 611

Community

loneubaid

Jan 31

how can i use huggingface model for text generation using langchain

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

151