Is it legal to scrape eBay sold listings and Amazon prices for arbitrage research?

Both eBay's completed/sold listings and Amazon's product pages are public data — price, sold status, title, and sales rank are shown to anyone who opens the page without logging in. Scraping public web data is not a CFAA violation in the United States, per hiQ Labs v. LinkedIn (9th Cir. 2022), which held that accessing publicly available information does not constitute unauthorized access under the statute. What the marketplaces' terms restrict is account-level abuse — using a logged-in session against their internal APIs, scraping behind authentication, or republishing a wholesale copy of their catalog as a competing product. Reading a sold-comp price to decide whether an item is worth reselling is ordinary competitive research, the same thing a human does by clicking through completed listings. The compliance work that actually matters for a reseller is downstream and category-specific: brand gating, restricted categories, and the authenticity rules of the platform you list on.

Why use eBay sold prices instead of the Amazon list price to judge demand?

An asking price is a hope; a sold price is a fact. Amazon's current list price tells you what one seller wants today, but it says nothing about whether that price clears at volume or how often the item actually moves. eBay's sold and completed listings are the closest public signal to realized demand: each row is a transaction that happened at a real number, so the median of recent solds is a defensible estimate of what you can actually get, and the count of solds over a window is a direct read on sell-through velocity. The Amazon side of the equation is the cost and the competition — its current price is your buy cost when you source from Amazon, and its sales rank (BSR) is a relative-velocity proxy on the sell side. Pairing realized eBay demand against the Amazon cost-and-rank is what separates a real spread from a listing that merely looks cheap.

← Back to blogStrategy

The Retail Arbitrage Data Routine: Spotting Underpriced Inventory Before Other Resellers

June 23, 2026 · 12 min read

If you run online or retail arbitrage — sourcing commodity inventory and flipping it for a margin, FBA or merchant-fulfilled — your edge is not access to products. Everyone can see the same listings. Your edge is knowing, before you buy, which items have proven demand at a price that clears a defensible spread over your cost. The trap that empties arbitrage accounts is buying against an asking price: an item looks cheap, you stock it, and then it sits because nobody actually pays the number you assumed. The fix is to anchor every buy decision on realized sale prices, not list prices.

This guide is the full screening routine. We will cover why a list price is the wrong input, how to pull real eBay sold comps to get a median realized price and a sell-through count, how to read Amazon's current price and sales rank as the sell side, how to compute a spread and ROI per item, rank the candidates, and write a clean CSV you can source from. The example is a small list of consumer-goods keywords, but the same code screens any commodity category by swapping the candidate list.

Why List Price Lies and Sold Comps Don't

Every marketplace shows you two very different numbers, and arbitrage beginners conflate them constantly. The first is the asking price — what a seller currently wants. The second is the realized price — what something actually sold for. They are not the same, and the gap is where money is lost.

A list of active listings is a list of hopes. Ten sellers can ask $60 for an item that, in practice, only ever clears at $42 — and a buyer who screens off the $60 figure will overpay for inventory that then sits. The asking price also tells you nothing about velocity: an item can be listed at an attractive number and move twice a year, which is a dead spread once you factor in capital tied up and storage.

Sold comps fix both problems at once. eBay publishes completed and sold listings — every row is a transaction that actually happened at a real number on a real date. Two statistics fall straight out of that:

The median realized price of recent solds — a robust estimate of what you can actually get, unskewed by one outlier auction.
The sell-through count — how many units cleared over the window you pulled, which is a direct read on demand velocity.

That is the demand side. The sell side is Amazon's current price (your cost when you source from Amazon) and its sales rank, or BSR — a relative-velocity proxy where a lower number means the item moves faster within its category. A real arbitrage candidate is one where the realized eBay price comfortably exceeds the Amazon cost and both velocity signals say the item moves. Screening on anything less than realized price is screening on fiction.

So the routine is: start from candidate keywords, pull eBay sold comps for realized price and velocity, pull Amazon's current price and BSR for cost and sell-side velocity, compute the spread and ROI per item, and rank.

Step 1: Pull eBay Sold Comps for Realized Price

The demand side comes from eBay's sold and completed listings. The trick is in the search URL: appending &LH_Sold=1&LH_Complete=1 filters the results to items that actually sold, instead of the default view of active asking prices. That single change is the difference between screening on hopes and screening on facts.

The endpoint takes one eBay search URL and a page count, and like every scrape endpoint it is asynchronous — you submit, get a job id back, then poll. Confirm one query works with curl before you loop:

# 1) Submit one sold-comp search — note LH_Sold=1&LH_Complete=1 in the URL
curl -G "https://api.logposervices.com/api/v1/ecommerce/ebay/search" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "url=https://www.ebay.com/sch/i.html?_nkw=anker+powercore+10000&LH_Sold=1&LH_Complete=1" \
  --data-urlencode "pages=3"
# → {"job_id": "eb_4c91...", "status": "pending"}

# 2) Poll the job until status == "completed"
curl -H "X-API-Key: lp_xxxxxxx" \
  https://api.logposervices.com/api/v1/jobs/eb_4c91

# 3) Fetch the sold rows
curl -H "X-API-Key: lp_xxxxxxx" \
  https://api.logposervices.com/api/v1/jobs/eb_4c91/result

The async pattern is not optional here. api.logposervices.com sits behind Cloudflare, which kills any single connection at roughly 90 seconds. A multi-page sold-comp pull can run longer than that, so never wait on one inline request — submit the job, let it run server-side, and poll for the result.

A few pages of solds is usually enough for a stable median. pages=3 is on the order of 100–150 completed listings, which is plenty to compute a robust median realized price and a meaningful sell-through count without over-fetching. Going deeper mostly pulls in older solds whose prices are staler, so prefer a tight, recent window over a deep one.

Step 2: Compute Median Price and Sell-Through

The raw sold rows need two numbers extracted per keyword: the median of the realized prices, and the count of solds. The median is deliberately chosen over the mean — a single broken-item auction or a bundle listing can drag a mean badly, but the median shrugs it off.

import os, time, statistics, requests

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}


def submit_ebay_sold(query, pages=3):
    nkw = query.replace(" ", "+")
    url = (f"https://www.ebay.com/sch/i.html?_nkw={nkw}"
           f"&LH_Sold=1&LH_Complete=1")
    r = requests.get(
        f"{BASE}/ecommerce/ebay/search",
        params={"url": url, "pages": pages},
        headers=HEADERS, timeout=30,
    )
    r.raise_for_status()
    return r.json()["job_id"]


def poll(job_id, key, poll_every=5, timeout_s=300):
    """Poll one job id; return the list at result[key]."""
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
        if s.get("status") == "completed":
            res = requests.get(f"{BASE}/jobs/{job_id}/result",
                               headers=HEADERS, timeout=30).json()
            return res.get(key, [])
        if s.get("status") == "failed":
            print(f"  job {job_id} failed: {s.get('error')}")
            return []
        time.sleep(poll_every)
    print(f"  job {job_id} timed out")
    return []


def ebay_demand(query):
    """Return median realized price and sell-through count for a keyword."""
    job_id = submit_ebay_sold(query)
    listings = poll(job_id, key="listings")
    prices = [float(x["price"]) for x in listings
              if x.get("price") and x.get("sold")]
    if not prices:
        return None
    return {
        "median_sold": round(statistics.median(prices), 2),
        "sold_count": len(prices),
        "low_sold": round(min(prices), 2),
        "high_sold": round(max(prices), 2),
    }


print(ebay_demand("anker powercore 10000"))
# → {'median_sold': 27.99, 'sold_count': 118, 'low_sold': 14.5, 'high_sold': 39.99}

The guard that earns its keep is filtering on both a present price and a truthy sold flag before computing anything. It keeps stray active or unsold rows out of the median, and it means a keyword that returned no real solds yields None rather than a misleading number built from one stale row. A sold_count in the low single digits is itself a signal — thin demand — so you will carry that count forward into ranking, not just the price.

Step 3: Read Amazon's Current Price and BSR

The sell side is Amazon. The smart endpoint takes an Amazon search URL, product URL, or a bare ASIN, and returns the current price, sales rank/BSR, title, and availability. That gives you the cost (when you source from Amazon) and a velocity read in one call.

Confirm it with curl against a single ASIN:

curl -G "https://api.logposervices.com/api/v1/ecommerce/amazon/smart" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "url=B0794RHPZD" \
  --data-urlencode "pages=1"
# → {"job_id": "az_77be...", "status": "pending"}

You can pass a bare ASIN (B0794RHPZD), a full product URL, or a search URL. For screening a known candidate you usually want the ASIN or product URL so you get one specific item back; a search URL is useful when you only have a keyword and want Amazon's top match to pair against the eBay comp. The poll-and-fetch flow is identical to the eBay job — submit, poll the job id, fetch the result.

def submit_amazon(identifier, pages=1):
    r = requests.get(
        f"{BASE}/ecommerce/amazon/smart",
        params={"url": identifier, "pages": pages},
        headers=HEADERS, timeout=30,
    )
    r.raise_for_status()
    return r.json()["job_id"]


def amazon_sell_side(identifier):
    """Return Amazon current price, BSR, title for an ASIN/URL/keyword."""
    job_id = submit_amazon(identifier)
    products = poll(job_id, key="products")
    if not products:
        return None
    p = products[0]
    price = p.get("price")
    return {
        "asin": p.get("asin"),
        "title": p.get("title"),
        "amazon_price": float(price) if price else None,
        "bsr": p.get("sales_rank") or p.get("bsr"),
        "availability": p.get("availability"),
    }


print(amazon_sell_side("B0794RHPZD"))
# → {'asin': 'B0794RHPZD', 'title': 'Anker PowerCore 10000 ...',
#    'amazon_price': 21.99, 'bsr': 412, 'availability': 'In Stock'}

Two fields besides price matter for screening. bsr (sales rank) is a relative-velocity proxy — a low number means the item turns over quickly inside its category, which de-risks a buy because it should resell fast. And availability is a hard gate: an out-of-stock or limited-availability item is not a sourceable buy no matter how good the spread looks, so you will drop those before ranking.

Step 4: Compute the Spread and ROI per Item

Now the two sides meet. For each candidate you have the realized eBay demand (median sold, sold count) and the Amazon cost-and-velocity (current price, BSR). The spread is the realized resale price minus the buy cost; the ROI is that spread as a fraction of cost. A realistic screen also nets out an estimate of fees and shipping so the ROI you rank on is closer to take-home, not gross.

def screen(candidate):
    """candidate = {'keyword':..., 'asin' or 'amazon_id':...}"""
    demand = ebay_demand(candidate["keyword"])
    if not demand:
        return None
    sell = amazon_sell_side(candidate.get("asin") or candidate["keyword"])
    if not sell or sell["amazon_price"] is None:
        return None
    if sell.get("availability") and "in stock" not in sell["availability"].lower():
        return None  # not sourceable right now

    cost = sell["amazon_price"]
    resale = demand["median_sold"]
    # Rough net of marketplace fees + shipping; tune to your category
    fees = round(resale * 0.13 + 4.00, 2)
    net_proceeds = round(resale - fees, 2)
    spread = round(net_proceeds - cost, 2)
    roi = round(spread / cost, 3) if cost else 0.0

    return {
        "keyword": candidate["keyword"],
        "asin": sell["asin"],
        "title": sell["title"],
        "amazon_cost": cost,
        "ebay_median_sold": resale,
        "ebay_sold_count": demand["sold_count"],
        "bsr": sell["bsr"],
        "est_net_proceeds": net_proceeds,
        "spread": spread,
        "roi": roi,
    }

The fee model here is deliberately crude — a percentage of the resale price plus a flat shipping figure — because exact fees depend on your category, fulfillment method, and weight. Replace the 0.13 and 4.00 with your real numbers; the point is that you rank on net spread, not the gross gap between two prices, so you do not get fooled by a $20 difference that a $14 fee load erases. ROI is carried as a fraction (0.35 = 35%) so the ranking step is a plain numeric sort.

Step 5: Rank Candidates and Write the CSV

The last step screens the whole candidate list, applies sourcing-quality filters, ranks by ROI, and writes a CSV you can hand to whoever does the buying. The filters are what keep marginal and risky rows off the list.

import csv

CANDIDATES = [
    {"keyword": "anker powercore 10000", "asin": "B0794RHPZD"},
    {"keyword": "logitech m185 mouse",   "asin": "B003NR57BY"},
    {"keyword": "hydro flask 32 oz",     "asin": "B083GBJN8B"},
    # ... your sourcing shortlist
]


def run(candidates, out_path, min_roi=0.30, min_solds=10):
    rows = []
    for c in candidates:
        r = screen(c)
        if not r:
            continue
        # Demand must be proven: enough recent solds to trust the median
        if r["ebay_sold_count"] < min_solds:
            continue
        # Spread must clear a real margin after fees
        if r["roi"] < min_roi:
            continue
        rows.append(r)

    rows.sort(key=lambda x: x["roi"], reverse=True)

    fields = ["keyword", "asin", "title", "amazon_cost", "ebay_median_sold",
              "ebay_sold_count", "bsr", "est_net_proceeds", "spread", "roi"]
    with open(out_path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
        w.writeheader()
        for r in rows:
            w.writerow(r)
    return len(rows)


n = run(CANDIDATES, "arbitrage_candidates.csv")
print(f"wrote {n} ranked candidates")

Two filters carry the quality of the whole list. The min_solds floor is the demand gate — a median computed from three solds is noise, so requiring ten or more recent transactions means every row that survives has proven velocity, not a one-off sale. The min_roi floor is the margin gate — after the fee net-out, anything under your threshold is not worth the capital and shelf time. What lands in the CSV is a short, ranked list where the top rows are the items with the strongest defensible spread and real demand behind them, which is exactly the input a sourcing decision needs.

Scaling This Into a Standing Screen

The routine above is a one-shot screen of a candidate list. The reseller reality is that prices move daily: an Amazon price drops and a dead candidate suddenly clears a spread; a price climbs and a live one closes. Re-running the whole screen by hand every morning is the part that quietly stops happening after a week.

Two things make a standing screen practical. First, the pipeline is just data — your candidate list is a list of dicts, so widening the screen is adding rows, and the screen / run functions never change. Second, the part that actually wants automation is the Amazon sell side, because that is what moves and reopens a spread. Rather than hosting your own cron plus a state store to remember yesterday's prices, LogPose exposes a monitor primitive that polls a saved Amazon search or ASIN on a schedule and notifies you when a threshold is crossed:

curl -X POST "https://api.logposervices.com/api/v1/monitors" \
  -H "X-API-Key: lp_xxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B0794RHPZD",
    "name": "PowerCore 10000 buy window",
    "metric": "price",
    "condition": "drops_below",
    "threshold": 20.00,
    "check_interval_hours": 24,
    "notify_channels": ["email"]
  }'

Set the threshold to the Amazon cost at which your eBay-median resale clears your ROI floor, and the monitor pings you the moment a buy window opens — no cron, no stored price history to maintain. notify_channels accepts email, webhook, telegram, slack, and discord, so the alert can land directly in whatever queue your sourcing runs from. Once the candidates that survive a screen become monitors, a one-time spreadsheet turns into a standing watchlist that surfaces buy windows as they happen — and the LogPose export tooling can drop the screened CSV straight into the sheet your buyer already works from.

The Honest Fit

This routine fits well for high-velocity replens and commodity online arbitrage — items with deep eBay sold history and an active Amazon listing, where a realized-price median is meaningful and the spread is the whole decision. The three primitives that make it reliable are the LH_Sold=1&LH_Complete=1 filter for realized demand, the Amazon smart endpoint for current cost and BSR, and the net-of-fees ROI sort that ranks on take-home rather than gross gap.

Where it is not the right tool: it is a screening routine, not a price-history database. If you need months of daily price-and-BSR history per ASIN to spot seasonality and reprice intelligently, a dedicated price-history product like a Keepa-style tracker is built for exactly that and will serve you better than re-deriving history from periodic pulls. And the data cannot tell you whether you are allowed to sell an item — brand-gated, IP-restricted, and authenticity-controlled categories are a hard wall that no spread justifies crossing, so treat eligibility as a separate, manual gate that runs before any number in this CSV matters. Inside its lane — commodity items, proven demand, defensible margin — the realized-price screen is the routine that keeps you buying the right inventory ahead of the resellers screening on asking prices.

Get Started

Sign up at logposervices.com and generate an API key under Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx
Pull your first sold-comp window, then pair it with the Amazon sell side:

curl -G "https://api.logposervices.com/api/v1/ecommerce/ebay/search" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "url=https://www.ebay.com/sch/i.html?_nkw=anker+powercore+10000&LH_Sold=1&LH_Complete=1" \
  --data-urlencode "pages=3"

Then run the ebay_demand and amazon_sell_side helpers over your candidate list, compute the net-of-fees ROI, rank, and write the CSV.

Related reading: How to scrape eBay sold listings for real prices for the realized-demand fundamentals, Track Amazon competitor prices daily into a CSV or Sheet for the standing sell-side screen, and How to extract Amazon ASIN data in bulk for scaling the Amazon side of the pipeline.

External: eBay, Amazon, hiQ Labs v. LinkedIn.

The Retail Arbitrage Data Routine: Spotting Underpriced Inventory Before Other Resellers

Why List Price Lies and Sold Comps Don't

Step 1: Pull eBay Sold Comps for Realized Price

Step 2: Compute Median Price and Sell-Through

Step 3: Read Amazon's Current Price and BSR

Step 4: Compute the Spread and ROI per Item

Step 5: Rank Candidates and Write the CSV

Scaling This Into a Standing Screen

The Honest Fit

Get Started

Frequently asked questions

Related posts

The Retail Arbitrage Data Routine: Spotting Underpriced Inventory Before Other Resellers

Frequently asked questions

Related posts

How a Cold-Email Agency Pulls 500 Fresh Local Leads a Week

How DTC Brands Catch a Competitor's Price Drop the Same Day

Apollo.io Alternatives for the Local Businesses Apollo Doesn't Have