LangChain
LangChain is a popular framework for building AI-powered applications with large language models (LLMs). Its web loaders and document retrievers can be configured with proxies for collecting data from the web.
Setting Up HypeProxy.io with LangChain
Using WebBaseLoader with Proxy
from langchain_community.document_loaders import WebBaseLoader
import os
# Set proxy environment variables
os.environ['HTTP_PROXY'] = 'http://username:password@fr.hypeproxy.host:port'
os.environ['HTTPS_PROXY'] = 'http://username:password@fr.hypeproxy.host:port'
loader = WebBaseLoader("https://example.com")
documents = loader.load()
Using AsyncHtmlLoader with Proxy
from langchain_community.document_loaders import AsyncHtmlLoader
urls = ["https://example.com", "https://example.org"]
loader = AsyncHtmlLoader(
urls,
proxy="http://username:password@fr.hypeproxy.host:port"
)
documents = loader.load()
Tips
- Use proxies when loading data from websites that have rate limiting or geo-restrictions.
- Mobile proxies from HypeProxy.io work best for scraping social media and dynamic content.
- Combine LangChain's text splitters with proxy-loaded content for RAG pipelines.
