Selenium headless: How to bypass Cloudflare detection using Selenium

We Are Going To Discuss About Selenium headless: How to bypass Cloudflare detection using Selenium. So lets Start this Python Article.

Selenium headless: How to bypass Cloudflare detection using Selenium

  1. How to solve Selenium headless: How to bypass Cloudflare detection using Selenium

    Using the latest Google Chrome v96.0 if you retrive the useragent
    For the google-chrome browser the following user-agent is in use:
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
    Where as for google-chrome-headless browser the following user-agent is in use:
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36
    In majority of the cases the presence of the additional Headless string/parameter/attribute is intercepted as a bot and cloudflare blocks the access to the website.

    Solution
    There are different approaches to evade the Cloudflare detection even using Chrome in headless mode and some of the efficient approaches are as follows:
    An efficient solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context. undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
    Code Block:
    import undetected_chromedriver as uc from selenium import webdriver options = webdriver.ChromeOptions() options.headless = True options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = uc.Chrome(options=options) driver.get('https://bet365.com')
    You can find a couple of relevant detailed discussions in:
    Selenium app redirect to Cloudflare page when hosted on Heroku
    Is there any possible ways to bypass cloudflare security checks?
    The most efficient solution would be to use Selenium Stealth to initialize the Chrome Browsing Context. selenium-stealth is a python package to prevent detection. This programme tries to make python selenium more stealthy.
    Code Block:
    from selenium import webdriver from selenium_stealth import stealth options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_argument("--headless") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r"C:\path\to\chromedriver.exe") stealth(driver, languages=["en-US", "en"], vendor="Google Inc.", platform="Win32", webgl_vendor="Intel Inc.", renderer="Intel Iris OpenGL Engine", fix_hairline=True, ) driver.get("https://bot.sannysoft.com/")
    You can find a couple of relevant detailed discussions in:
    Can a website detect when you are using Selenium with chromedriver?
    How to automate login to a site which is detecting my attempts to login using
    selenium-stealth

  2. Selenium headless: How to bypass Cloudflare detection using Selenium

    Using the latest Google Chrome v96.0 if you retrive the useragent
    For the google-chrome browser the following user-agent is in use:
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
    Where as for google-chrome-headless browser the following user-agent is in use:
    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36
    In majority of the cases the presence of the additional Headless string/parameter/attribute is intercepted as a bot and cloudflare blocks the access to the website.

    Solution
    There are different approaches to evade the Cloudflare detection even using Chrome in headless mode and some of the efficient approaches are as follows:
    An efficient solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context. undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
    Code Block:
    import undetected_chromedriver as uc from selenium import webdriver options = webdriver.ChromeOptions() options.headless = True options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = uc.Chrome(options=options) driver.get('https://bet365.com')
    You can find a couple of relevant detailed discussions in:
    Selenium app redirect to Cloudflare page when hosted on Heroku
    Is there any possible ways to bypass cloudflare security checks?
    The most efficient solution would be to use Selenium Stealth to initialize the Chrome Browsing Context. selenium-stealth is a python package to prevent detection. This programme tries to make python selenium more stealthy.
    Code Block:
    from selenium import webdriver from selenium_stealth import stealth options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_argument("--headless") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r"C:\path\to\chromedriver.exe") stealth(driver, languages=["en-US", "en"], vendor="Google Inc.", platform="Win32", webgl_vendor="Intel Inc.", renderer="Intel Iris OpenGL Engine", fix_hairline=True, ) driver.get("https://bot.sannysoft.com/")
    You can find a couple of relevant detailed discussions in:
    Can a website detect when you are using Selenium with chromedriver?
    How to automate login to a site which is detecting my attempts to login using
    selenium-stealth

Solution 1

Using the latest Google Chrome v96.0 if you retrive the useragent

  • For the browser the following is in use:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
    
  • Where as for browser the following is in use:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36
    

In majority of the cases the presence of the additional Headless string/parameter/attribute is intercepted as a and blocks the access to the website.


Solution

There are different approaches to evade the Cloudflare detection even using Chrome in mode and some of the efficient approaches are as follows:

  • An efficient solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context. undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

    • Code Block:

      import undetected_chromedriver as uc
      from selenium import webdriver
      
      options = webdriver.ChromeOptions() 
      options.headless = True
      options.add_argument("start-maximized")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('useAutomationExtension', False)
      driver = uc.Chrome(options=options)
      driver.get('https://bet365.com')
      

You can find a couple of relevant detailed discussions in:

  • The most efficient solution would be to use Selenium Stealth to initialize the Chrome Browsing Context. selenium-stealth is a python package to prevent detection. This programme tries to make python selenium more stealthy.

    • Code Block:

      from selenium import webdriver
      from selenium_stealth import stealth
      
      options = webdriver.ChromeOptions()
      options.add_argument("start-maximized")
      options.add_argument("--headless")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('useAutomationExtension', False)
      driver = webdriver.Chrome(options=options, executable_path=r"C:\path\to\chromedriver.exe")
      
      stealth(driver,
              languages=["en-US", "en"],
              vendor="Google Inc.",
              platform="Win32",
              webgl_vendor="Intel Inc.",
              renderer="Intel Iris OpenGL Engine",
              fix_hairline=True,
              )
      
      driver.get("https://bot.sannysoft.com/")
      

You can find a couple of relevant detailed discussions in:

Original Author undetected Selenium Of This Content

Solution 2

The cloudflare protection IUAM is used primary to avoid ddos attacks and for consequence it also protect sites from automation bot exploitation so no matter what you are using in the client side the cloudflare server is fingerprinting you.
After that they send to the client side the cf_clearance a cookie that allows you to connect for the next 15 minutes.

enter image description here

Original Author Franz Kurt Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment