How to Scrape Amazon Search Results & Track Rankings (Python)
You want to know what shows up when an Amazon shopper searches "wireless earbuds." Maybe you are an SEO tracking where your product ranks, or you are researching a new category. The Amazon search results page has organic listings, sponsored placements, and editorial slots — and Amazon does not expose any of this through an official API. You scrape it.
TL;DR: there are two ways to scrape Amazon search results — a DIY requests + BeautifulSoup script (free, you maintain it, Amazon blocks it often) or a managed API (you send a search URL, get back parsed positions). This guide shows both, what is actually in the search HTML, and how to turn it into daily Amazon rank tracking. Jump to the code.
One thing up front: scraping public Amazon search pages (no login, public data) is generally treated as lawful in the US, but it does break Amazon's Terms of Service and Amazon aggressively blocks bots — keep it to research, throttle your requests, and expect captchas. With that out of the way: this guide covers the DIY scrape (requests + BS4 on a search URL), what is actually in the HTML, and why search pages are tougher to scrape than product pages.
Why Search Pages Are Harder Than Product Pages
Three things ratchet up the difficulty:
Heavier bot detection. Search is where Amazon's most commercially sensitive data lives — exposing sponsored bid signals, ranking patterns, and ad-spend efficiency. The anti-bot stack on /s?k=... URLs is more aggressive than on /dp/<ASIN> URLs.
More dynamic JS. Product pages render most useful data server-side. Search pages lean on client-side rendering for some sponsored carousels and filter panels. A raw requests.get misses about 10-15% of the result blocks; you need to compensate at the parser level.
Sponsored vs organic is signal, not noise. Both occupy positions in the result grid. Treating them as the same row corrupts your ranking data. The DOM marks them separately (data-component-type="sp-sponsored-result" vs s-search-result), but only if you check.
Pagination caps. Amazon paginates search up to roughly page 20, then the results become non-unique. For SEO research, pages 1-5 are what matters; deeper is rarely actionable.
The DIY Approach
import requests
from bs4 import BeautifulSoup
from urllib.parse import quote_plus
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/127.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
def scrape_search(keyword: str, page: int = 1) -> list[dict]:
url = f"https://www.amazon.com/s?k={quote_plus(keyword)}&page={page}"
r = requests.get(url, headers=HEADERS, timeout=15)
if r.status_code != 200 or "validateCaptcha" in r.url:
return []
soup = BeautifulSoup(r.text, "html.parser")
out = []
position = 1
selector = (
"div[data-component-type='s-search-result'], "
"div[data-component-type='sp-sponsored-result']"
)
for card in soup.select(selector):
asin = card.get("data-asin")
if not asin:
continue
title_el = card.select_one("h2 a span")
price_el = card.select_one("span.a-price > span.a-offscreen")
sponsored = card.get("data-component-type") == "sp-sponsored-result"
out.append({
"position": position,
"asin": asin,
"title": title_el.get_text(strip=True) if title_el else None,
"price": price_el.get_text(strip=True) if price_el else None,
"sponsored": sponsored,
})
position += 1
return out
if __name__ == "__main__":
for entry in scrape_search("wireless earbuds", page=1):
print(entry)
Real Limitations
- Mid-page carousels. "Editor's picks" and "Highly rated" carousels mid-page have their own DOM structures; the selector above skips them.
- Position counting. Should you count sponsored slots in the
positionfield, or only organic? For SEO research, separate counters per type tell you more. - Page-1 vs deep pages. Page 1 has more curated content (badges, "Amazon's Choice"). Pages 2+ are more uniform organic + sponsored alternation.
- Local availability. The default US-wide view differs from a zip-code-specific view. Set the
session-tokencookie or accept the default.
A failure mode that costs you bad data:
# Your scraper saw 0 results because Amazon shipped a new search layout
# where the result wrapper changed from data-component-type="s-search-result"
# to data-csa-c-content-id="s-search-result". Same ASINs, different attribute.
# Your job logged "scraped 0 results for keyword X" — no error, just empty.
Always alert when result count drops below a threshold for a known-stable keyword.
Scaling Beyond a Single Keyword Script
For SEO tracking across hundreds of keywords daily:
Separate sponsored from organic. Two distinct position columns per row: position_in_organic, position_in_sponsored. Plot them separately over time.
Track multiple pages. First-page rank matters most, but movement on pages 2-3 predicts page-1 entry. Scrape 3-5 pages per keyword.
Watermark by date + locale. A (keyword, locale, scraped_at, page, position, asin, sponsored) schema covers most ranking research.
Capture "Amazon's Choice" as a boolean. It is not a position — it is a marker on a specific result. Store it as a field, not a row.
Compute share-of-voice. For your brand: % of first-page slots, % of sponsored slots, weighted by position. That is the real KPI, not raw rank.
The LogPose smart endpoint accepts a search URL and a pages parameter (1-10), returning organized search-result objects with the sponsored flag pre-parsed:
import os
import time
import requests
from urllib.parse import quote_plus
API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
def scrape(url: str, pages: int = 1) -> dict:
submit = requests.get(
f"{BASE}/ecommerce/amazon/smart",
params={"url": url, "pages": pages},
headers=HEADERS, timeout=30,
).json()
job_id = submit["job_id"]
while True:
s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
if s["status"] in ("completed", "failed"):
break
time.sleep(2)
return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()
keywords = ["wireless earbuds", "bluetooth speaker", "smartwatch"]
for k in keywords:
url = f"https://www.amazon.com/s?k={quote_plus(k)}"
data = scrape(url, pages=5)
results = data.get("results", [])
print(f"{k}: {len(results)} positions across 5 pages")
For sustained tracking, persist (keyword, scraped_at, position, asin, sponsored) to a database and run the scrape on a daily cron.
Common Mistakes
- Treating sponsored as position-equivalent. They are different signals; separate them in storage.
- Scraping the same keyword 100× a day. Daily is enough; hourly is noise.
- Ignoring locale. US, UK, DE rankings are independent. Tag rows with locale.
- No keyword normalization. "wireless earbuds" vs "Wireless Earbuds" vs "wireless%20earbuds" — pick one canonical form.
- Reading "Amazon's Choice" as a brand signal. It is algorithmic and changes hourly. Use as a binary, not a trust metric.
The Landscape
For Amazon search-results tracking:
- DataForSEO — has Amazon SERP endpoints; SERP-focused tool with broad search engine coverage.
- Helium 10's Cerebro — Amazon-only keyword research; depth on ASIN-to-keyword mapping.
- JungleScout — keyword + ranking research bundled with sales estimates.
- DIY + residential proxies — full control if you already run scrapers; you own the data and schema.
- LogPose —
smartendpoint withpagesparameter on search URLs; useful when search is one of several Amazon surfaces (product, reviews, BSR) you scrape together.
If your goal is pure Amazon keyword research, a dedicated tool like Helium 10 is usually faster to value. For SEO ops teams that need raw rank data piped into their own warehouse, a managed scraping API is more flexible.
Get Started
- Sign up at logposervices.com.
- Generate an API key.
- Run the snippet above for your top 10 keywords daily.
- After two weeks of data you will have actionable rank-over-time charts.
Related: scrape Amazon prices in Python, bulk ASIN extraction, BSR tracking.
External: Amazon SP-API docs, BeautifulSoup docs.