How to Get Amazon Product Reviews via API
You ship a product on Amazon and want every new 1-star review in your Slack within 30 minutes. Or you are a competitor researcher building a sentiment dashboard. Either way, Amazon's official API gives you the review count and the average star rating, but not the review text. The actual reviews live on the public web pages, and you scrape them yourself.
This guide covers the DIY path (requests + BeautifulSoup against /product-reviews/<ASIN>/) and the managed API path. Both code samples actually run.
Why Review Scraping Is Trickier Than Product Scraping
Three differences from a regular /dp/ page:
Aggressive bot detection on review pages. Reviews are where Amazon's most commercially valuable signal lives, so detection on these URLs is even more aggressive than on standard product pages. Datacenter IPs are blocked outright; residential IPs need slow pacing.
Pagination caps. Amazon shows roughly 10 pages of reviews publicly (~100 reviews). The full reviewer base for a popular product is in the tens of thousands, but only ~100 are exposed. This matters when you architect a "scrape all reviews" workflow — you cannot.
Mixed metadata signals. Each review has metadata: verified-purchase badge, Amazon Vine label, helpful-vote count, reviewer name (often truncated), and sometimes reviewer location. Extracting all of these reliably from changing HTML is more work than scraping a single price field.
Locale-specific URLs. The reviews URL pattern is /product-reviews/<ASIN> on each amazon.* domain. Reviews on amazon.com are independent from reviews on amazon.de — scrape each locale separately.
The DIY Approach
import requests
from bs4 import BeautifulSoup
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/127.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
def scrape_reviews(asin: str, page: int = 1, sort: str = "recent") -> list[dict]:
url = (
f"https://www.amazon.com/product-reviews/{asin}/"
f"?pageNumber={page}&sortBy={sort}"
)
r = requests.get(url, headers=HEADERS, timeout=15)
if r.status_code != 200 or "validateCaptcha" in r.url:
return []
soup = BeautifulSoup(r.text, "html.parser")
out = []
for el in soup.select("[data-hook='review']"):
title_el = el.select_one("[data-hook='review-title'] span")
body_el = el.select_one("[data-hook='review-body'] span")
rating_el = el.select_one("[data-hook='review-star-rating'] span.a-icon-alt")
date_el = el.select_one("[data-hook='review-date']")
verified = bool(el.select_one("[data-hook='avp-badge']"))
out.append({
"id": el.get("id"),
"title": title_el.get_text(strip=True) if title_el else None,
"body": body_el.get_text(strip=True) if body_el else None,
"rating": rating_el.get_text(strip=True) if rating_el else None,
"date": date_el.get_text(strip=True) if date_el else None,
"verified_purchase": verified,
})
return out
if __name__ == "__main__":
for page in range(1, 11):
for r in scrape_reviews("B09V3KXJPB", page=page):
print(r)
Real Limitations
- ~10 page cap. Amazon hides reviews beyond page 10. You get ~100 reviews per product, not the full history.
- Sort matters. The default sort is "Top reviews" (Amazon's curation). Add
sortBy=recentfor chronological order — necessary for monitoring new reviews. - CAPTCHA threshold. Heavier than on product pages. Expect to be challenged after 3-10 successful pages.
- HTML rotation. Amazon ships different review-card markup in different A/B variants. The
data-hookattributes above are the most stable, but not invariant. - Locale. Scraping amazon.com reviews tells you nothing about amazon.de — repeat per locale.
A typical failure when blocked looks like a generic error page that comes back HTTP 200 OK:
<title>Sorry! Something went wrong on our end.</title>
<img src="https://images-na.ssl-images-amazon.com/images/G/01/error/title._TTD_.png" />
Your scraper sees zero reviews and keeps going as if everything is fine, silently corrupting your dataset.
Scaling Beyond a One-File Scraper
If you need reviews from more than a handful of products on a regular schedule, you will add:
Sort-aware pagination. Fetch both top and recent sorts to maximize coverage within the 100-review cap.
Dedupe by review ID. Each Amazon review has a stable data-hook="review" element with an id attribute. Use that to dedupe across runs and across sorts.
Sentiment + topic extraction. Pipe review bodies into a small NLP pipeline (spaCy, transformers, or a hosted classifier). Track sentiment per product per week, not raw review counts.
Storage schema. A reviews table with (asin, review_id, body, rating, verified_purchase, vine, posted_at) is enough. Index on (asin, posted_at desc) for the "latest reviews" query.
Cross-locale joins. If you sell on amazon.com and amazon.de, scrape each separately and tag the rows with the locale. Sentiment and helpfulness signals do not transfer between locales cleanly.
A managed API handles the proxy and CAPTCHA layer for you. The LogPose smart endpoint accepts any Amazon URL — product, reviews, search, or category — and returns structured data:
import os
import time
import requests
API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
def scrape(path: str, **params) -> dict:
submit = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30).json()
job_id = submit["job_id"]
while True:
s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
if s["status"] in ("completed", "failed"):
break
time.sleep(2)
if s["status"] != "completed":
raise RuntimeError(s.get("error"))
return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()
asin = "B09V3KXJPB"
data = scrape(
"ecommerce/amazon/smart",
url=f"https://www.amazon.com/product-reviews/{asin}/?sortBy=recent",
pages=10,
)
print(f"{len(data.get('reviews', []))} reviews extracted")
The pages parameter accepts 1-10 (the publicly visible page cap). For watching new reviews on a long list of ASINs, the bulk endpoint accepts an array of review URLs and returns a parent bulk_id you can poll.
Common Mistakes
- Scraping the product page instead of the reviews page. Product pages show ~8 curated reviews; the
/product-reviews/URL shows the full paginated list. - Forgetting
sortBy=recent. Default sort is curated; you will miss new reviews if you do not switch to chronological. - Treating verified vs unverified the same. For brand sentiment, weight verified higher.
- Re-scraping all pages every run. Once you have seen a review ID, you have seen it. Diff against your store.
- Ignoring Amazon Vine reviews. Vine reviewers receive free product, not money — their reviews skew positive. Filter or weight accordingly.
The Landscape
For Amazon-specific review extraction, the realistic options are:
- DIY — full control, full maintenance burden. Sustainable for one product line.
- ScraperAPI / Bright Data — solve proxies and rendering; you still write the review parser.
- Apify — has community Amazon-review actors; quality and freshness vary.
- Keepa — historical price and rating data with some review metrics, but not full review text.
- LogPose — structured review JSON from the
smartendpoint, alongside the same key working for product/search/category pages.
If you only need review counts over time, the Product Advertising API is the cheapest route. For text, you are scraping.
Get Started
- Sign up at logposervices.com and generate an API key from Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx- Run the snippet above against any ASIN's
/product-reviews/URL.
Related: scrape Amazon product prices in Python, bulk ASIN extraction, web scraping API guide.
External: Amazon Product Advertising API, BeautifulSoup docs.