Scrapy
Scrapy is the most popular Python framework for web scraping. It provides a complete toolkit for extracting data from websites, processing it, and storing it in your preferred format. Configuring HypeProxy.io proxies with Scrapy is straightforward.
Setting Up HypeProxy.io with Scrapy
Basic Proxy Configuration
Add the following to your settings.py:
# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
# HypeProxy.io proxy
HTTP_PROXY = 'http://username:password@fr.hypeproxy.host:port'
Using a Custom Middleware
# middlewares.py
class HypeProxyMiddleware:
def process_request(self, request, spider):
request.meta['proxy'] = 'http://username:password@fr.hypeproxy.host:port'
With IP Rotation via API
import requests
API_TOKEN = 'YOUR_API_TOKEN'
PROXY_ID = 'YOUR_PROXY_ID'
# Rotate IP before scraping
requests.get(
f'https://api.hypeproxy.io/Utils/DirectRenewIp/{PROXY_ID}',
headers={'Authorization': f'Bearer {API_TOKEN}'}
)
Tips
- Use Scrapy's built-in retry middleware to handle proxy timeouts gracefully.
- For large-scale scraping, rotate IPs using the HypeProxy.io API between batches of requests.
- Set
CONCURRENT_REQUESTSto a reasonable number to avoid overwhelming the proxy. - Add
DOWNLOAD_DELAYto space out requests and appear more natural.
