Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More documentation for how to use with requests/lxml #97

Open
lukeprofits opened this issue Jul 31, 2023 · 2 comments
Open

More documentation for how to use with requests/lxml #97

lukeprofits opened this issue Jul 31, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@lukeprofits
Copy link

lukeprofits commented Jul 31, 2023

Selenium is awesome, but I am trying to use this with requests and lxml. It seems like it is solving things properly, but I am having trouble submitting the solution. Could you add some example usage to the readme?

This is what I am doing right now using requests/lxml:

import random
import requests
from lxml import html
from fake_useragent import UserAgent
import csv
import time
import os
from amazoncaptcha import AmazonCaptcha


amazon_captcha_xpath = '//h4[contains(text(), "Enter the characters you see below")]'
captcha_image_xpath = '//div[@class="a-row a-text-center"]/img/@src'


def get_link(url, session=None, user_agent=None, proxy=None):
    """
    Fetches the HTML content from the provided URL.
    Returns a parsed lxml HTML tree that can be used with XPath.
    """
    ua = UserAgent()
    headers = {'User-Agent': ua.google if not user_agent else user_agent}
    proxies = {'http': proxy, 'https': proxy} if proxy else {}

    if session is None:
        session = requests.Session()

    response = session.get(url, headers=headers, proxies=proxies)
    tree = html.fromstring(response.content)

    return tree, session


# code that does stuff assuming there is no captcha. Leaving it out because it's long and probably not helpful.

if tree.xpath(amazon_captcha_xpath):
    bot_check = True
    print(html.tostring(tree).decode())
    print('[ Captcha Detected! ]')

    captcha_image_link = tree.xpath(captcha_image_xpath)[0]
    print(captcha_image_link)

    solution = AmazonCaptcha.fromlink(captcha_image_link).solve()
    print(f'Solution is: {solution}')

    print('Pausing to seem human...')
    time.sleep(random.randrange(3, 15))

 
    print('Submitting solution')
    
    # THIS IS THE PART TO SUBIMT IT THAT DOES NOT SEEM TO WORK
    
    amzn = tree.xpath('//input[@name="amzn"]/@value')[0]
    amzn_r = tree.xpath('//input[@name="amzn-r"]/@value')[0]

    data = {
        'amzn': amzn,
        'amzn-r': amzn_r,
        'field-keywords': solution
    }

    response = response = session.post('https://www.amazon.com/errors/validateCaptcha', data=data)

    # check response
    print(response.status_code)   # always comes back as 503
    #print(response.text)
    #input('PAUSED')
    ```
@lukeprofits lukeprofits added the enhancement New feature or request label Jul 31, 2023
@3ldar
Copy link

3ldar commented Aug 1, 2023

It is not in python but I will share my nodejs implementation of how to resolve amazon captcha:

 const amzn = $("form input[type=hidden]").val();
 let amazonPass: string;
    if (options.baseUrl.includes('?')) {
        const [base, query] = options.baseUrl.split('?');

        amazonPass = `${base}/errors/validateCaptcha?amzn=${amzn}&amzn-r=/&field-keywords=${captcha}&${query}`
    } else {
        amazonPass = `${options.baseUrl}/errors/validateCaptcha?amzn=${amzn}&amzn-r=/&field-keywords=${captcha}`
    }

    const response =  await gotScraping({
        url: amazonPass,
        cookieJar: options.cookieJar,
        followRedirect: true,
        headers: {
            "referer": options.baseUrl
        },
        // @ts-ignore
        proxyUrl: options.proxyUrl,
        sessionToken: options.sessionToken,
        throwHttpErrors: false,

    })

gotScraping is a request like a library. The thing it is a get request, requires referer and and a followup URL to redirect after the captcha resolves.

@stale
Copy link

stale bot commented Sep 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants