More documentation for how to use with requests/lxml #97

lukeprofits · 2023-07-31T07:17:09Z

Selenium is awesome, but I am trying to use this with requests and lxml. It seems like it is solving things properly, but I am having trouble submitting the solution. Could you add some example usage to the readme?

This is what I am doing right now using requests/lxml:

import random
import requests
from lxml import html
from fake_useragent import UserAgent
import csv
import time
import os
from amazoncaptcha import AmazonCaptcha


amazon_captcha_xpath = '//h4[contains(text(), "Enter the characters you see below")]'
captcha_image_xpath = '//div[@class="a-row a-text-center"]/img/@src'


def get_link(url, session=None, user_agent=None, proxy=None):
    """
    Fetches the HTML content from the provided URL.
    Returns a parsed lxml HTML tree that can be used with XPath.
    """
    ua = UserAgent()
    headers = {'User-Agent': ua.google if not user_agent else user_agent}
    proxies = {'http': proxy, 'https': proxy} if proxy else {}

    if session is None:
        session = requests.Session()

    response = session.get(url, headers=headers, proxies=proxies)
    tree = html.fromstring(response.content)

    return tree, session


# code that does stuff assuming there is no captcha. Leaving it out because it's long and probably not helpful.

if tree.xpath(amazon_captcha_xpath):
    bot_check = True
    print(html.tostring(tree).decode())
    print('[ Captcha Detected! ]')

    captcha_image_link = tree.xpath(captcha_image_xpath)[0]
    print(captcha_image_link)

    solution = AmazonCaptcha.fromlink(captcha_image_link).solve()
    print(f'Solution is: {solution}')

    print('Pausing to seem human...')
    time.sleep(random.randrange(3, 15))

 
    print('Submitting solution')
    
    # THIS IS THE PART TO SUBIMT IT THAT DOES NOT SEEM TO WORK
    
    amzn = tree.xpath('//input[@name="amzn"]/@value')[0]
    amzn_r = tree.xpath('//input[@name="amzn-r"]/@value')[0]

    data = {
        'amzn': amzn,
        'amzn-r': amzn_r,
        'field-keywords': solution
    }

    response = response = session.post('https://www.amazon.com/errors/validateCaptcha', data=data)

    # check response
    print(response.status_code)   # always comes back as 503
    #print(response.text)
    #input('PAUSED')
    ```

The text was updated successfully, but these errors were encountered:

3ldar · 2023-08-01T07:40:53Z

It is not in python but I will share my nodejs implementation of how to resolve amazon captcha:

 const amzn = $("form input[type=hidden]").val();
 let amazonPass: string;
    if (options.baseUrl.includes('?')) {
        const [base, query] = options.baseUrl.split('?');

        amazonPass = `${base}/errors/validateCaptcha?amzn=${amzn}&amzn-r=/&field-keywords=${captcha}&${query}`
    } else {
        amazonPass = `${options.baseUrl}/errors/validateCaptcha?amzn=${amzn}&amzn-r=/&field-keywords=${captcha}`
    }

    const response =  await gotScraping({
        url: amazonPass,
        cookieJar: options.cookieJar,
        followRedirect: true,
        headers: {
            "referer": options.baseUrl
        },
        // @ts-ignore
        proxyUrl: options.proxyUrl,
        sessionToken: options.sessionToken,
        throwHttpErrors: false,

    })

gotScraping is a request like a library. The thing it is a get request, requires referer and and a followup URL to redirect after the captcha resolves.

stale · 2023-09-16T23:09:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

lukeprofits added the enhancement New feature or request label Jul 31, 2023

lukeprofits assigned a-maliarov Jul 31, 2023

stale bot added the wontfix This will not be worked on label Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More documentation for how to use with requests/lxml #97

More documentation for how to use with requests/lxml #97

lukeprofits commented Jul 31, 2023 •

edited

Loading

3ldar commented Aug 1, 2023

stale bot commented Sep 16, 2023

More documentation for how to use with requests/lxml #97

More documentation for how to use with requests/lxml #97

Comments

lukeprofits commented Jul 31, 2023 • edited Loading

3ldar commented Aug 1, 2023

stale bot commented Sep 16, 2023

lukeprofits commented Jul 31, 2023 •

edited

Loading