Hello

Привіт

Home

Blog

Work

Indeed Scraper

About

Contact

Menu

Indeed Scraper

Services
  • Web Scraping

  • Protection Bypass

  • Technologies
  • Python, Selenium

  • Cloudflare Bypass

  • Location & Year
  • Remote

  • 2023

  • Project details ...

    01

    Challenge

    Indeed uses a multi-layered Cloudflare protection system that blocks automated requests. It was necessary to create a solution for bypassing Bot Detection, CAPTCHA challenges and browser fingerprint analysis.

    02

    Solution

    Developed a custom Selenium driver with undetected-chromedriver, User-Agent rotation, human behavior emulation (random pauses, scrolling, mouse movement) and a proxy pool for load distribution.

    03

    Result

    Stable scraper with speed of 500+ jobs/hour, 95% Cloudflare bypass success rate. The client received an automated tool for job market monitoring with CSV/Excel export.

    Code Example
    # Indeed Parser - Cloudflare Bypass
    from undetected_chromedriver import Chrome, ChromeOptions
    from bs4 import BeautifulSoup
    import random, time, csv
    
    class IndeedScraper:
        def __init__(self):
            self.options = ChromeOptions()
            self.options.add_argument('--disable-blink-features=AutomationControlled')
            self.user_agents = [
                'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
                'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Chrome/120.0.0.0'
            ]
    
        def bypass_cloudflare(self, driver, url):
            """Cloudflare protection bypass with human behavior emulation"""
            driver.get(url)
            time.sleep(random.uniform(3, 7))
    
            # Scroll emulation
            for _ in range(3):
                driver.execute_script(f"window.scrollBy(0, {random.randint(100, 300)})")
                time.sleep(random.uniform(0.5, 1.5))
    
        def extract_company_data(self, soup):
            """Extract company data"""
            return {
                'company': soup.select_one('.companyName').text.strip(),
                'location': soup.select_one('.companyLocation').text.strip(),
                'website': self.extract_website(soup),
                'email': self.extract_email(soup),
                'phone': self.extract_phone(soup)
            }
    
        def save_to_csv(self, data, filename='indeed_results.csv'):
            """Export results to CSV"""
            with open(filename, 'w', newline='', encoding='utf-8') as f:
                writer = csv.DictWriter(f, fieldnames=data[0].keys())
                writer.writeheader()
                writer.writerows(data)