Menu
Indeed Scraper
Services
Web Scraping
Protection Bypass
Technologies
Python, Selenium
Cloudflare Bypass
Location & Year
Remote
2023
Project details ...
01
Challenge
Indeed uses a multi-layered Cloudflare protection system that blocks automated requests. It was necessary to create a solution for bypassing Bot Detection, CAPTCHA challenges and browser fingerprint analysis.
02
Solution
Developed a custom Selenium driver with undetected-chromedriver, User-Agent rotation, human behavior emulation (random pauses, scrolling, mouse movement) and a proxy pool for load distribution.
03
Result
Stable scraper with speed of 500+ jobs/hour, 95% Cloudflare bypass success rate. The client received an automated tool for job market monitoring with CSV/Excel export.
Code Example
# Indeed Parser - Cloudflare Bypass
from undetected_chromedriver import Chrome, ChromeOptions
from bs4 import BeautifulSoup
import random, time, csv
class IndeedScraper:
def __init__(self):
self.options = ChromeOptions()
self.options.add_argument('--disable-blink-features=AutomationControlled')
self.user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Chrome/120.0.0.0'
]
def bypass_cloudflare(self, driver, url):
"""Cloudflare protection bypass with human behavior emulation"""
driver.get(url)
time.sleep(random.uniform(3, 7))
# Scroll emulation
for _ in range(3):
driver.execute_script(f"window.scrollBy(0, {random.randint(100, 300)})")
time.sleep(random.uniform(0.5, 1.5))
def extract_company_data(self, soup):
"""Extract company data"""
return {
'company': soup.select_one('.companyName').text.strip(),
'location': soup.select_one('.companyLocation').text.strip(),
'website': self.extract_website(soup),
'email': self.extract_email(soup),
'phone': self.extract_phone(soup)
}
def save_to_csv(self, data, filename='indeed_results.csv'):
"""Export results to CSV"""
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)