How to Scrape Amazon Product Prices with Python
Whether you are building a deal tracker, a competitor monitor, or an ecommerce analytics tool, scraping Amazon prices in Python looks simple until Amazon starts blocking requests. This guide gives you both halves of the problem: a 30-line requests + BeautifulSoup script that works on a clean residential IP, and a drop-in API call that returns clean JSON without managing proxies or selectors. Both code samples actually run.
Why Scraping Amazon Is Harder Than It Looks
Three things make Amazon one of the harder consumer sites to scrape reliably.
Aggressive bot detection. Amazon has been investing in anti-scraping for over a decade. Their detection stack combines IP reputation (datacenter ranges are flagged immediately), TLS and HTTP/2 fingerprinting (curl and stock requests have a distinctive ClientHello), and behavioural signals from the rendered page. The free tier of "just send a GET request" is essentially closed.
Frequent HTML changes. Amazon A/B-tests product pages constantly. The price selector span.a-price > span.a-offscreen works on most desktop pages today; the same product can render as span.priceToPay in a different variant. Any selector you write has a shelf life.
No usable official API. The Product Advertising API ties to an active Amazon Associates account, has 1 request-per-second rate limit (scaled by revenue), and exposes a curated subset of catalog data — no historical pricing, no Buy Box loser data, no review text in bulk.
The result: a "weekend project" Amazon scraper takes a week to build and breaks every few weeks in production.
The DIY Approach (with real code)
This script works against unprotected product pages from a clean residential IP. Use it to understand what you are actually getting back before deciding whether to outsource.
import requests
from bs4 import BeautifulSoup
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/127.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
def scrape_amazon_product(url: str) -> dict | None:
r = requests.get(url, headers=HEADERS, timeout=15)
if r.status_code != 200:
return None
if "validateCaptcha" in r.url or "Robot Check" in r.text:
return None
soup = BeautifulSoup(r.text, "html.parser")
title = soup.select_one("#productTitle")
price = soup.select_one("span.a-price > span.a-offscreen")
rating = soup.select_one("span.a-icon-alt")
if not title or not price:
return None
return {
"title": title.get_text(strip=True),
"price": price.get_text(strip=True),
"rating": rating.get_text(strip=True) if rating else None,
}
if __name__ == "__main__":
print(scrape_amazon_product("https://www.amazon.com/dp/B09V3KXJPB"))
Real Limitations
- From a clean residential IP, expect 5 to 30 successful responses before Amazon returns
/errors/validateCaptchaor HTTP 503. - From a datacenter IP, your first request is usually the redirect.
- Two consecutive requests from the same IP within a second is a strong bot signal — pace yourself.
- The selectors above worked at publish time. Validate them; they will eventually shift.
What "blocked" actually looks like — the HTML you get back when Amazon decides you're a bot:
<title>Amazon.com</title>
<form action="/errors/validateCaptcha" method="get">
<img src="https://images-na.ssl-images-amazon.com/captcha/.../Captcha_xyz.jpg" />
<label>Type the characters you see in this image:</label>
<input type="text" name="field-keywords" />
<input type="submit" value="Continue shopping" />
</form>
Your parser returns None, your monitoring dashboard goes blank, and you don't know whether the product is genuinely unavailable or you are just blocked.
Scaling Beyond a One-File Scraper
A single script works for a handful of products and a couple of runs a day. The moment you want hourly checks on 50 ASINs or a persistent price history, you hit problems that have nothing to do with parsing HTML.
Proxy rotation. Residential proxy pools require connection pooling, a sticky-session vs rotating-session strategy, and retry logic when an upstream proxy dies mid-request. Bright Data, Smartproxy, and Oxylabs are the usual vendors; expect setup work even after you sign up.
Retry queues. When 5% of requests fail and 15% return CAPTCHAs, you need a queue that retries with exponential backoff, gives up after N attempts, and writes the failed URLs somewhere you can reconcile later. Celery, RQ, or BullMQ all work.
Selector maintenance. Track changes to your CSS selectors over time. A weekly job that fetches a known-stable product and asserts each selector still returns a non-empty string is the cheapest early-warning you can build.
CAPTCHA handling. If you are going to handle CAPTCHAs yourself, you will route them to a service like 2captcha or DeathByCaptcha. Cheaper, but more brittle, is to detect the CAPTCHA, drop the proxy, and try a different IP.
Historical storage. Postgres with a (asin, observed_at) index is fine for tens of millions of rows. TimescaleDB compresses older data automatically. Avoid schemas that store the full HTML — keep parsed fields only, save scraped HTML to S3 if you need the raw artifact.
Monitoring jobs. Once you are scraping on schedule, your operational concerns shift from "did the script run" to "did each ASIN get a fresh data point in the last N hours, and if not, why." Build the dashboard before you build more scrapers.
A managed scraping API removes most of this. LogPose's smart endpoint accepts any Amazon URL — product page, search results, or category — and auto-detects what to extract, handling proxy rotation and anti-bot bypass on its end. The flow is asynchronous: submit a job, poll until the result is ready (typically 4-12 seconds for a product page).
import os
import time
import requests
API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
def scrape_amazon(url: str, timeout_seconds: int = 90) -> dict:
submit = requests.get(
f"{BASE}/ecommerce/amazon/smart",
params={"url": url, "pages": 1},
headers=HEADERS,
timeout=30,
)
submit.raise_for_status()
job_id = submit.json()["job_id"]
deadline = time.time() + timeout_seconds
while time.time() < deadline:
status = requests.get(
f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15,
).json()
if status["status"] == "completed":
break
if status["status"] == "failed":
raise RuntimeError(f"scrape failed: {status.get('error')}")
time.sleep(2)
else:
raise TimeoutError(f"job {job_id} did not finish in {timeout_seconds}s")
return requests.get(
f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15,
).json()
if __name__ == "__main__":
print(scrape_amazon("https://www.amazon.com/dp/B09V3KXJPB"))
# Bare ASIN also works:
# print(scrape_amazon("B09V3KXJPB"))
The result is a structured object with title, price, rating, review_count, availability, images, and the raw asin — typed fields, not strings you need to regex.
Building an Amazon Price Tracker in Python
For a real Amazon price tracker, you want history, not snapshots. Two patterns work:
Roll your own cron + storage. Schedule scrape_amazon(url) to run every six hours via cron or a job scheduler. Append the price to a Postgres table with (asin, observed_at, price) columns. Plot from there.
Use a monitor that stores history for you. The LogPose monitor endpoint runs scrapes on a schedule, stores each result, and can fire a webhook or email when a condition triggers — useful for price-drop alerts:
requests.post(
f"{BASE}/monitors",
headers=HEADERS,
json={
"url": "https://www.amazon.com/dp/B09V3KXJPB",
"name": "MacBook Air price watch",
"metric": "price",
"condition": "drops_below",
"threshold": 999.00,
"check_interval_hours": 6,
"notify_channels": ["email"],
},
).raise_for_status()
Pull the price history later with GET /api/v1/monitors/{monitor_id}/history and pipe it into Chart.js, Pandas, or a spreadsheet.
Scraping Amazon ASINs Directly
The Amazon smart endpoint accepts a bare 10-character ASIN and expands it to the canonical /dp/<ASIN> URL on the server side:
data = scrape_amazon("B09V3KXJPB") # bare ASIN
data = scrape_amazon("https://www.amazon.com/dp/B09V3KXJPB") # full URL
# Both behave identically.
For a CSV of ASINs, loop sequentially with a small sleep, or use the bulk endpoint (covered in Extract Amazon ASIN data in bulk).
How to Scrape Amazon Without Getting Blocked
If you stay DIY, four things make a measurable difference:
- Realistic browser headers. Match a current Chrome version.
Accept-Language,Accept-Encoding, andAcceptall need to be present and ordered like a real browser would send them. - Slow down. Three to ten seconds between requests against the same IP. Amazon's per-IP rate limit is opaque but generous to slow clients.
- Rotate residential IPs. Datacenter IPs are flagged. Residential pools have a reputation premium but bypass the first detection layer.
- Watch for the redirect. If
r.urlcontains/errors/validateCaptcha, the proxy is burned for that ASIN. Retry from a different IP.
There is no setup that lets you scrape Amazon at 100 req/sec from one source. That is a deliberate part of their infrastructure design, not a bug you can work around.
Common Mistakes
- Single-product pages do not paginate. The
pagesparameter is meaningful for search and category pages (1-10). On a/dp/page,pages=1is the only valid value. - Locale matters. The endpoint validates that the host is
amazon.*. UK (amazon.co.uk), Germany (amazon.de), and other locales work; cross-locale price comparison is a strong use case. - Bare ASIN input is fine.
params={"url": "B09V3KXJPB"}is canonicalized tohttps://www.amazon.com/dp/B09V3KXJPB. - Poll politely. Two seconds between status checks is a reasonable default.
- Edge timeout is 100 seconds. Cloudflare sits in front of
api.logposervices.com; jobs that stall past 90 seconds should be treated as failed.
The Amazon Scraping Landscape
If you are evaluating tools, the honest options break down roughly as:
- DIY with a residential proxy pool (Bright Data, Smartproxy, Oxylabs) — most control, most operational cost. Best when you have an existing scraping team.
- General-purpose scraping APIs (ScraperAPI, ScrapingBee) — proxy + render layer. You still write the parser. Good for varied targets.
- Specialized Amazon trackers (Keepa, JungleScout, Helium 10) — pre-built dashboards for sellers and researchers; high quality data but locked-in product surface.
- LogPose — single API across 11 platforms with structured JSON output and a built-in monitor system. Most useful when Amazon is one of several sites you need.
None of these are strictly better than the others. Pick the one whose tradeoffs match your timeline.
Get Started
- Sign up at logposervices.com and generate an API key from Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx- Run the snippet above against any Amazon product URL or ASIN.
For the monitoring flow, see the Amazon price tracker guide. For an overview of when a managed API makes sense vs DIY across multiple sites, read the web scraping API guide.
External: Python requests, BeautifulSoup docs, hiQ Labs v. LinkedIn.