← Back to blogTutorial

How to Get Amazon Product Reviews via API

· 9 min read

You ship a product on Amazon and want every new 1-star review in your Slack within 30 minutes. Or you are a competitor researcher building a sentiment dashboard. Either way, Amazon's official API gives you the review count and the average star rating, but not the review text. The actual reviews live on the public web pages, and you scrape them yourself.

This guide covers the DIY path (requests + BeautifulSoup against /product-reviews/<ASIN>/) and the managed API path. Both code samples actually run.

Why Review Scraping Is Trickier Than Product Scraping

Three differences from a regular /dp/ page:

Aggressive bot detection on review pages. Reviews are where Amazon's most commercially valuable signal lives, so detection on these URLs is even more aggressive than on standard product pages. Datacenter IPs are blocked outright; residential IPs need slow pacing.

Pagination caps. Amazon shows roughly 10 pages of reviews publicly (~100 reviews). The full reviewer base for a popular product is in the tens of thousands, but only ~100 are exposed. This matters when you architect a "scrape all reviews" workflow — you cannot.

Mixed metadata signals. Each review has metadata: verified-purchase badge, Amazon Vine label, helpful-vote count, reviewer name (often truncated), and sometimes reviewer location. Extracting all of these reliably from changing HTML is more work than scraping a single price field.

Locale-specific URLs. The reviews URL pattern is /product-reviews/<ASIN> on each amazon.* domain. Reviews on amazon.com are independent from reviews on amazon.de — scrape each locale separately.

The DIY Approach

import requests
from bs4 import BeautifulSoup

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/127.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}


def scrape_reviews(asin: str, page: int = 1, sort: str = "recent") -> list[dict]:
    url = (
        f"https://www.amazon.com/product-reviews/{asin}/"
        f"?pageNumber={page}&sortBy={sort}"
    )
    r = requests.get(url, headers=HEADERS, timeout=15)
    if r.status_code != 200 or "validateCaptcha" in r.url:
        return []

    soup = BeautifulSoup(r.text, "html.parser")
    out = []
    for el in soup.select("[data-hook='review']"):
        title_el = el.select_one("[data-hook='review-title'] span")
        body_el = el.select_one("[data-hook='review-body'] span")
        rating_el = el.select_one("[data-hook='review-star-rating'] span.a-icon-alt")
        date_el = el.select_one("[data-hook='review-date']")
        verified = bool(el.select_one("[data-hook='avp-badge']"))
        out.append({
            "id": el.get("id"),
            "title": title_el.get_text(strip=True) if title_el else None,
            "body": body_el.get_text(strip=True) if body_el else None,
            "rating": rating_el.get_text(strip=True) if rating_el else None,
            "date": date_el.get_text(strip=True) if date_el else None,
            "verified_purchase": verified,
        })
    return out


if __name__ == "__main__":
    for page in range(1, 11):
        for r in scrape_reviews("B09V3KXJPB", page=page):
            print(r)

Real Limitations

  • ~10 page cap. Amazon hides reviews beyond page 10. You get ~100 reviews per product, not the full history.
  • Sort matters. The default sort is "Top reviews" (Amazon's curation). Add sortBy=recent for chronological order — necessary for monitoring new reviews.
  • CAPTCHA threshold. Heavier than on product pages. Expect to be challenged after 3-10 successful pages.
  • HTML rotation. Amazon ships different review-card markup in different A/B variants. The data-hook attributes above are the most stable, but not invariant.
  • Locale. Scraping amazon.com reviews tells you nothing about amazon.de — repeat per locale.

A typical failure when blocked looks like a generic error page that comes back HTTP 200 OK:

<title>Sorry! Something went wrong on our end.</title>
<img src="https://images-na.ssl-images-amazon.com/images/G/01/error/title._TTD_.png" />

Your scraper sees zero reviews and keeps going as if everything is fine, silently corrupting your dataset.

Scaling Beyond a One-File Scraper

If you need reviews from more than a handful of products on a regular schedule, you will add:

Sort-aware pagination. Fetch both top and recent sorts to maximize coverage within the 100-review cap.

Dedupe by review ID. Each Amazon review has a stable data-hook="review" element with an id attribute. Use that to dedupe across runs and across sorts.

Sentiment + topic extraction. Pipe review bodies into a small NLP pipeline (spaCy, transformers, or a hosted classifier). Track sentiment per product per week, not raw review counts.

Storage schema. A reviews table with (asin, review_id, body, rating, verified_purchase, vine, posted_at) is enough. Index on (asin, posted_at desc) for the "latest reviews" query.

Cross-locale joins. If you sell on amazon.com and amazon.de, scrape each separately and tag the rows with the locale. Sentiment and helpfulness signals do not transfer between locales cleanly.

A managed API handles the proxy and CAPTCHA layer for you. The LogPose smart endpoint accepts any Amazon URL — product, reviews, search, or category — and returns structured data:

import os
import time
import requests

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}


def scrape(path: str, **params) -> dict:
    submit = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30).json()
    job_id = submit["job_id"]
    while True:
        s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
        if s["status"] in ("completed", "failed"):
            break
        time.sleep(2)
    if s["status"] != "completed":
        raise RuntimeError(s.get("error"))
    return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()


asin = "B09V3KXJPB"
data = scrape(
    "ecommerce/amazon/smart",
    url=f"https://www.amazon.com/product-reviews/{asin}/?sortBy=recent",
    pages=10,
)
print(f"{len(data.get('reviews', []))} reviews extracted")

The pages parameter accepts 1-10 (the publicly visible page cap). For watching new reviews on a long list of ASINs, the bulk endpoint accepts an array of review URLs and returns a parent bulk_id you can poll.

Common Mistakes

  • Scraping the product page instead of the reviews page. Product pages show ~8 curated reviews; the /product-reviews/ URL shows the full paginated list.
  • Forgetting sortBy=recent. Default sort is curated; you will miss new reviews if you do not switch to chronological.
  • Treating verified vs unverified the same. For brand sentiment, weight verified higher.
  • Re-scraping all pages every run. Once you have seen a review ID, you have seen it. Diff against your store.
  • Ignoring Amazon Vine reviews. Vine reviewers receive free product, not money — their reviews skew positive. Filter or weight accordingly.

The Landscape

For Amazon-specific review extraction, the realistic options are:

  • DIY — full control, full maintenance burden. Sustainable for one product line.
  • ScraperAPI / Bright Data — solve proxies and rendering; you still write the review parser.
  • Apify — has community Amazon-review actors; quality and freshness vary.
  • Keepa — historical price and rating data with some review metrics, but not full review text.
  • LogPose — structured review JSON from the smart endpoint, alongside the same key working for product/search/category pages.

If you only need review counts over time, the Product Advertising API is the cheapest route. For text, you are scraping.

Get Started

  1. Sign up at logposervices.com and generate an API key from Tool → API Keys.
  2. export LOGPOSE_API_KEY=lp_xxxxxxx
  3. Run the snippet above against any ASIN's /product-reviews/ URL.

Related: scrape Amazon product prices in Python, bulk ASIN extraction, web scraping API guide.

External: Amazon Product Advertising API, BeautifulSoup docs.

Frequently asked questions

Does Amazon have an official reviews API?
No. The Product Advertising API returns review summaries (count and average star rating) but not the review text itself. For full review content, scraping the public review pages is the only option.
Is it legal to scrape Amazon reviews?
In the US, public-page scraping does not violate the Computer Fraud and Abuse Act (hiQ v. LinkedIn, 9th Cir. 2022). It does breach Amazon's Conditions of Use if you are logged in or use automated tools against their terms. Scrape unauthenticated public pages and you stay on the safe side of US precedent.
How many reviews can I extract per product?
Amazon caps public review pagination at roughly 10 pages, ~100 reviews. The Top Reviews tab shows curated ones; the Most Recent tab shows chronological. Scrape both sorts to maximize coverage within the cap.
What is the difference between Vine and verified reviews?
Verified purchase reviews come from confirmed buyers. Amazon Vine reviews are by selected reviewers given free products by Amazon. Vine reviews carry a labeled badge in the HTML; verified ones have a separate badge. Filter or weight by these markers if reliability matters for your sentiment analysis.
Can I get historical review data for Amazon products?
Reviews do not change date once posted, so a single scrape captures a snapshot of all currently visible reviews. To track new reviews over time, schedule a periodic scrape sorted by recency and dedupe by review ID.

Related posts

Tutorial

Extract Amazon ASIN Data in Bulk

9 min read
Strategy

Monitor Amazon Competitor Pricing Daily

9 min read
Tutorial

How to Scrape Amazon Product Prices with Python

10 min read