Is it legal to scrape Amazon product pages?

In the US, scraping public web pages does not violate the Computer Fraud and Abuse Act (hiQ Labs v. LinkedIn, 9th Cir. 2022). It does breach Amazon's Conditions of Use if you are logged in or use automated tools against their terms. The pragmatic safe path is unauthenticated requests against public product pages, with traffic that does not impact Amazon's servers.

Can Amazon detect web scraping?

Yes. Amazon uses IP reputation, request fingerprinting (TLS, HTTP/2 frames, header order), JavaScript behavioural signals, and per-account/per-IP rate limits. A plain Python requests call from a datacenter IP is detected within 1-3 requests. Residential IPs, slow pacing, and rotating fingerprints extend that — but you are still in an arms race.

Does Amazon have an official price API?

The Amazon Product Advertising API (PA API 5.0) exposes price data, but requires an active Amazon Associates account with qualifying sales, has a baseline rate limit of 1 request per second (scaled by your earnings), and does not return price history. For competitor analysis, deal monitoring, or non-affiliate use cases, scraping is the only practical option.

How often can I scrape Amazon without getting blocked?

From a single residential IP with realistic headers and 3-5 seconds between requests, expect 5-30 successful requests before a CAPTCHA or 503. From a datacenter IP, often the first request is already a /errors/validateCaptcha redirect. A managed scraping API with rotating residential proxies removes that ceiling — your real limit becomes how the API itself is metered.

Can I scrape Amazon prices with Selenium or Playwright?

Yes, but a plain Chromedriver or Playwright session is still fingerprinted (WebGL renderer, missing CDP-detection patches, mouse-event entropy). Stealth plugins help against basic detection, not against the behavioural layer. Headless browsers are useful for JS-rendered pages but do not solve the bot-detection problem on their own.

← Back to blogTutorial

How to Scrape Amazon Product Prices with Python

May 12, 2026 · 10 min read

Whether you are building a deal tracker, a competitor monitor, or an ecommerce analytics tool, scraping Amazon prices in Python looks simple until Amazon starts blocking requests. This guide gives you both halves of the problem: a 30-line requests + BeautifulSoup script that works on a clean residential IP, and a drop-in API call that returns clean JSON without managing proxies or selectors. Both code samples actually run.

Why Scraping Amazon Is Harder Than It Looks

Three things make Amazon one of the harder consumer sites to scrape reliably.

Aggressive bot detection. Amazon has been investing in anti-scraping for over a decade. Their detection stack combines IP reputation (datacenter ranges are flagged immediately), TLS and HTTP/2 fingerprinting (curl and stock requests have a distinctive ClientHello), and behavioural signals from the rendered page. The free tier of "just send a GET request" is essentially closed.

Frequent HTML changes. Amazon A/B-tests product pages constantly. The price selector span.a-price > span.a-offscreen works on most desktop pages today; the same product can render as span.priceToPay in a different variant. Any selector you write has a shelf life.

No usable official API. The Product Advertising API ties to an active Amazon Associates account, has 1 request-per-second rate limit (scaled by revenue), and exposes a curated subset of catalog data — no historical pricing, no Buy Box loser data, no review text in bulk.

The result: a "weekend project" Amazon scraper takes a week to build and breaks every few weeks in production.

The DIY Approach (with real code)

This script works against unprotected product pages from a clean residential IP. Use it to understand what you are actually getting back before deciding whether to outsource.

import requests
from bs4 import BeautifulSoup

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/127.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}


def scrape_amazon_product(url: str) -> dict | None:
    r = requests.get(url, headers=HEADERS, timeout=15)
    if r.status_code != 200:
        return None
    if "validateCaptcha" in r.url or "Robot Check" in r.text:
        return None

    soup = BeautifulSoup(r.text, "html.parser")
    title = soup.select_one("#productTitle")
    price = soup.select_one("span.a-price > span.a-offscreen")
    rating = soup.select_one("span.a-icon-alt")

    if not title or not price:
        return None

    return {
        "title": title.get_text(strip=True),
        "price": price.get_text(strip=True),
        "rating": rating.get_text(strip=True) if rating else None,
    }


if __name__ == "__main__":
    print(scrape_amazon_product("https://www.amazon.com/dp/B09V3KXJPB"))

Real Limitations

From a clean residential IP, expect 5 to 30 successful responses before Amazon returns /errors/validateCaptcha or HTTP 503.
From a datacenter IP, your first request is usually the redirect.
Two consecutive requests from the same IP within a second is a strong bot signal — pace yourself.
The selectors above worked at publish time. Validate them; they will eventually shift.

What "blocked" actually looks like — the HTML you get back when Amazon decides you're a bot:

<title>Amazon.com</title>
<form action="/errors/validateCaptcha" method="get">
  <img src="https://images-na.ssl-images-amazon.com/captcha/.../Captcha_xyz.jpg" />
  <label>Type the characters you see in this image:</label>
  <input type="text" name="field-keywords" />
  <input type="submit" value="Continue shopping" />
</form>

Your parser returns None, your monitoring dashboard goes blank, and you don't know whether the product is genuinely unavailable or you are just blocked.

Scaling Beyond a One-File Scraper

A single script works for a handful of products and a couple of runs a day. The moment you want hourly checks on 50 ASINs or a persistent price history, you hit problems that have nothing to do with parsing HTML.

Proxy rotation. Residential proxy pools require connection pooling, a sticky-session vs rotating-session strategy, and retry logic when an upstream proxy dies mid-request. Bright Data, Smartproxy, and Oxylabs are the usual vendors; expect setup work even after you sign up.

Retry queues. When 5% of requests fail and 15% return CAPTCHAs, you need a queue that retries with exponential backoff, gives up after N attempts, and writes the failed URLs somewhere you can reconcile later. Celery, RQ, or BullMQ all work.

Selector maintenance. Track changes to your CSS selectors over time. A weekly job that fetches a known-stable product and asserts each selector still returns a non-empty string is the cheapest early-warning you can build.

CAPTCHA handling. If you are going to handle CAPTCHAs yourself, you will route them to a service like 2captcha or DeathByCaptcha. Cheaper, but more brittle, is to detect the CAPTCHA, drop the proxy, and try a different IP.

Historical storage. Postgres with a (asin, observed_at) index is fine for tens of millions of rows. TimescaleDB compresses older data automatically. Avoid schemas that store the full HTML — keep parsed fields only, save scraped HTML to S3 if you need the raw artifact.

Monitoring jobs. Once you are scraping on schedule, your operational concerns shift from "did the script run" to "did each ASIN get a fresh data point in the last N hours, and if not, why." Build the dashboard before you build more scrapers.

A managed scraping API removes most of this. LogPose's smart endpoint accepts any Amazon URL — product page, search results, or category — and auto-detects what to extract, handling proxy rotation and anti-bot bypass on its end. The flow is asynchronous: submit a job, poll until the result is ready (typically 4-12 seconds for a product page).

import os
import time
import requests

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}


def scrape_amazon(url: str, timeout_seconds: int = 90) -> dict:
    submit = requests.get(
        f"{BASE}/ecommerce/amazon/smart",
        params={"url": url, "pages": 1},
        headers=HEADERS,
        timeout=30,
    )
    submit.raise_for_status()
    job_id = submit.json()["job_id"]

    deadline = time.time() + timeout_seconds
    while time.time() < deadline:
        status = requests.get(
            f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15,
        ).json()
        if status["status"] == "completed":
            break
        if status["status"] == "failed":
            raise RuntimeError(f"scrape failed: {status.get('error')}")
        time.sleep(2)
    else:
        raise TimeoutError(f"job {job_id} did not finish in {timeout_seconds}s")

    return requests.get(
        f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15,
    ).json()


if __name__ == "__main__":
    print(scrape_amazon("https://www.amazon.com/dp/B09V3KXJPB"))
    # Bare ASIN also works:
    # print(scrape_amazon("B09V3KXJPB"))

The result is a structured object with title, price, rating, review_count, availability, images, and the raw asin — typed fields, not strings you need to regex.

Building an Amazon Price Tracker in Python

For a real Amazon price tracker, you want history, not snapshots. Two patterns work:

Roll your own cron + storage. Schedule scrape_amazon(url) to run every six hours via cron or a job scheduler. Append the price to a Postgres table with (asin, observed_at, price) columns. Plot from there.

Use a monitor that stores history for you. The LogPose monitor endpoint runs scrapes on a schedule, stores each result, and can fire a webhook or email when a condition triggers — useful for price-drop alerts:

requests.post(
    f"{BASE}/monitors",
    headers=HEADERS,
    json={
        "url": "https://www.amazon.com/dp/B09V3KXJPB",
        "name": "MacBook Air price watch",
        "metric": "price",
        "condition": "drops_below",
        "threshold": 999.00,
        "check_interval_hours": 6,
        "notify_channels": ["email"],
    },
).raise_for_status()

Pull the price history later with GET /api/v1/monitors/{monitor_id}/history and pipe it into Chart.js, Pandas, or a spreadsheet.

Scraping Amazon ASINs Directly

The Amazon smart endpoint accepts a bare 10-character ASIN and expands it to the canonical /dp/<ASIN> URL on the server side:

data = scrape_amazon("B09V3KXJPB")                              # bare ASIN
data = scrape_amazon("https://www.amazon.com/dp/B09V3KXJPB")    # full URL
# Both behave identically.

For a CSV of ASINs, loop sequentially with a small sleep, or use the bulk endpoint (covered in Extract Amazon ASIN data in bulk).

How to Scrape Amazon Without Getting Blocked

If you stay DIY, four things make a measurable difference:

Realistic browser headers. Match a current Chrome version. Accept-Language, Accept-Encoding, and Accept all need to be present and ordered like a real browser would send them.
Slow down. Three to ten seconds between requests against the same IP. Amazon's per-IP rate limit is opaque but generous to slow clients.
Rotate residential IPs. Datacenter IPs are flagged. Residential pools have a reputation premium but bypass the first detection layer.
Watch for the redirect. If r.url contains /errors/validateCaptcha, the proxy is burned for that ASIN. Retry from a different IP.

There is no setup that lets you scrape Amazon at 100 req/sec from one source. That is a deliberate part of their infrastructure design, not a bug you can work around.

Common Mistakes

Single-product pages do not paginate. The pages parameter is meaningful for search and category pages (1-10). On a /dp/ page, pages=1 is the only valid value.
Locale matters. The endpoint validates that the host is amazon.*. UK (amazon.co.uk), Germany (amazon.de), and other locales work; cross-locale price comparison is a strong use case.
Bare ASIN input is fine. params={"url": "B09V3KXJPB"} is canonicalized to https://www.amazon.com/dp/B09V3KXJPB.
Poll politely. Two seconds between status checks is a reasonable default.
Edge timeout is 100 seconds. Cloudflare sits in front of api.logposervices.com; jobs that stall past 90 seconds should be treated as failed.

The Amazon Scraping Landscape

If you are evaluating tools, the honest options break down roughly as:

DIY with a residential proxy pool (Bright Data, Smartproxy, Oxylabs) — most control, most operational cost. Best when you have an existing scraping team.
General-purpose scraping APIs (ScraperAPI, ScrapingBee) — proxy + render layer. You still write the parser. Good for varied targets.
Specialized Amazon trackers (Keepa, JungleScout, Helium 10) — pre-built dashboards for sellers and researchers; high quality data but locked-in product surface.
LogPose — single API across 11 platforms with structured JSON output and a built-in monitor system. Most useful when Amazon is one of several sites you need.

None of these are strictly better than the others. Pick the one whose tradeoffs match your timeline.

Get Started

Sign up at logposervices.com and generate an API key from Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx
Run the snippet above against any Amazon product URL or ASIN.

For the monitoring flow, see the Amazon price tracker guide. For an overview of when a managed API makes sense vs DIY across multiple sites, read the web scraping API guide.

External: Python requests, BeautifulSoup docs, hiQ Labs v. LinkedIn.

How to Scrape Amazon Product Prices with Python

Why Scraping Amazon Is Harder Than It Looks

The DIY Approach (with real code)

Real Limitations

Scaling Beyond a One-File Scraper

Building an Amazon Price Tracker in Python

Scraping Amazon ASINs Directly

How to Scrape Amazon Without Getting Blocked

Common Mistakes

The Amazon Scraping Landscape

Get Started

Frequently asked questions

Related posts

How to Scrape Amazon Product Prices with Python

Frequently asked questions

Related posts

CamelCamelCamel Alternatives for Tracking Amazon Prices at Scale

Helium 10 Alternatives for Sellers Who Want the Raw Search & BSR Data

Jungle Scout Alternatives for Amazon Research on Raw Data