Scrapy

Scrapy

A powerful Python web scraping framework for extracting data from websites at scale.

Scrapy

Scrapy is the most popular Python framework for web scraping. It provides a complete toolkit for extracting data from websites, processing it, and storing it in your preferred format. Configuring HypeProxy.io proxies with Scrapy is straightforward.

Setting Up HypeProxy.io with Scrapy

Basic Proxy Configuration

Add the following to your settings.py:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}

# HypeProxy.io proxy
HTTP_PROXY = 'http://username:password@fr.hypeproxy.host:port'

Using a Custom Middleware

# middlewares.py
class HypeProxyMiddleware:
    def process_request(self, request, spider):
        request.meta['proxy'] = 'http://username:password@fr.hypeproxy.host:port'

With IP Rotation via API

import requests

API_TOKEN = 'YOUR_API_TOKEN'
PROXY_ID = 'YOUR_PROXY_ID'

# Rotate IP before scraping
requests.get(
    f'https://api.hypeproxy.io/Utils/DirectRenewIp/{PROXY_ID}',
    headers={'Authorization': f'Bearer {API_TOKEN}'}
)

Tips

  • Use Scrapy's built-in retry middleware to handle proxy timeouts gracefully.
  • For large-scale scraping, rotate IPs using the HypeProxy.io API between batches of requests.
  • Set CONCURRENT_REQUESTS to a reasonable number to avoid overwhelming the proxy.
  • Add DOWNLOAD_DELAY to space out requests and appear more natural.

Was this article helpful?