Is it legal to scrape eBay sold listings?

eBay sold listings are public data — anyone with a browser can view them without logging in, and Google indexes them. Scraping public web data is not a CFAA violation in the US (hiQ Labs v. LinkedIn, 9th Cir. 2022), and EU/UK precedent treats public marketplace pricing as lawful to collect for analytical use. What eBay's User Agreement forbids is automated access to the underlying Trading and Browse APIs without registered credentials, and republishing the data as a competing marketplace product. For internal pricing decisions — looking up what a Coach bag, a pair of Air Jordans, or a vintage camera actually sold for last week — the scrape itself sits on settled ground. The downstream use case is what to scrutinize, not the data collection.

Why are sold prices more useful than active listings?

Active listings tell you what sellers hope to get; sold listings tell you what buyers actually paid. The gap between the two is routinely 30–50% on used goods, and skews even wider on hyped categories like sneakers and trading cards where wishful asking prices dominate the active feed. For a reseller pricing thrift-store finds or estate-sale inventory, the active number is misleading and the sold number is the only one that maps to real cash. eBay's own internal recommendation engine for sellers — the 'See what sold for' tool inside the listing flow — keys exclusively on the sold archive for exactly this reason. If you want to know what an item is worth right now, the sold feed is the answer; the active feed is aspiration.

How recent are eBay's sold-listing archives?

eBay surfaces roughly the last 90 days of sold listings through the public sold-filter view (`LH_Sold=1&LH_Complete=1`). Older transactions are visible to logged-in sellers through Terapeak (now bundled with the Seller Hub) but are not exposed in the public scrape path. For most reseller use cases — pricing inventory that will turn in 2–8 weeks — 90 days is plenty, because market prices on used goods drift fast enough that older comps are misleading anyway. If a long historical archive matters (insurance valuations, dispute evidence, collector-grade rarity baselining), the right pattern is to scrape weekly into your own database and let history accumulate from your collection date forward.

How many sold listings can I get per search?

eBay caps a sold-listing search at roughly 240 items per filtered URL (4 pages of 60), after which the result density falls off sharply and the feed starts including loosely-related items. For wider coverage on a high-volume keyword like 'iphone 13 pro' or 'jordan 1', narrow the search before paginating — add a model number, a colorway, a size, a condition filter, or a date range. A query like `nikon d750 used` returns 200 clean comps; a query like `dslr camera` returns the same 200 slots filled mostly with junk. Narrow keyword → cleaner comps is the rule on every marketplace scrape, and eBay's sold archive rewards it more than most because of how its relevance ranking weights recent sales.

Can I track sold prices over time?

Yes, and this is where the workflow gets interesting for serious resellers. Run the same sold-listing scrape on a weekly cadence, append each pull into a Parquet or SQLite database keyed on `(query, item_id)`, and you have a longitudinal price-history dataset that eBay itself does not expose to non-Terapeak users. After 8–12 weeks you can compute median sale price by week, identify trend direction (vintage Coach is up; mid-2010s Coach is down), flag arbitrage spikes (a sneaker drops 40% on resale after a re-release announcement), and time your own listings around the demand curve. The whole pipeline lives in roughly 80 lines of Python once you have the scrape working.

← Back to blogTutorial

How to Scrape eBay Sold Listings for Real Sale Prices

May 28, 2026 · 11 min read

If you flip thrifted goods, estate-sale lots, sneakers, vintage clothing, used electronics, books, or any other secondhand inventory, your single hardest problem is pricing. You just paid four dollars for a vintage Carhartt jacket at a Goodwill, or fifty for a box of old camera lenses at an estate sale, or three hundred for a pallet of returns at a liquidator. What is any of it actually worth? Not the optimistic number some seller is asking on the active feed — the number a buyer actually paid yesterday. That dataset exists, it is public, and it is called eBay sold listings. This guide walks the full pipeline: building the sold-filter URL, running the scrape, cleaning the output into a comps table, and turning a weekly refresh into a real pricing system.

Why Sold Listings Beat Every Other Pricing Source

The honest field comparison looks like this. Sold listings on eBay are the closest thing the used-goods market has to a real-time price index. Active listings on eBay, Mercari, Poshmark, and Depop are all asking prices — what sellers want, not what buyers paid, and the gap between the two is routinely 30–50% on used goods. Worthpoint and similar paid databases aggregate eBay sold data with a delay and behind a paywall. Discord groups and Reddit threads share anecdotes that are biased toward home runs, not medians. Specialized platforms like StockX and GOAT are clean for the categories they cover (sneakers, watches, handbags) but cover nothing else.

The other reason sold listings win is volume. For any keyword with more than a handful of weekly transactions, eBay's sold archive is a statistically meaningful sample. A median across 50 sold comps from the last 90 days is a defensible price; a median across 3 anecdotes from a Facebook group is not.

What an eBay Sold Listing Returns

Per item, a sold-search result returns:

Field	Example
`title`	Air Jordan 1 Retro High OG Chicago Lost & Found 2022 Size 10
`price`	312.50
`currency`	USD
`sold_date`	2026-05-21
`condition`	Pre-owned
`seller`	sneakerkid_42
`seller_feedback`	1842
`item_url`	https://www.ebay.com/itm/2856xxxxxx
`item_id`	2856xxxxxx
`image_url`	https://i.ebayimg.com/images/g/...
`shipping`	Free shipping
`bids`	14 (auction) or null (BIN)
`format`	Auction / Buy It Now / Best Offer

What it does not return: the buyer's identity, the final shipping cost broken out, the exact best-offer accepted price when the listing was Best Offer with a hidden accepted figure (eBay obscures this — the displayed price is the accepted offer, but with no provenance), or the item description body. For pricing decisions, the fields above are what you need; price + condition + date is the core comp signal and everything else is metadata.

Building the Sold-Filter URL

This is the critical detail that makes eBay sold-listing scraping work: the sold filter is fully URL-encoded, and you build it once and then template across keywords. The two query parameters that flip a regular eBay search into a sold-listing search are:

LH_Sold=1 — only show items that sold
LH_Complete=1 — only show completed (ended) listings

Combine those with the standard keyword parameter _nkw and you have the full URL shape:

https://www.ebay.com/sch/i.html?_nkw=<keyword>&LH_Sold=1&LH_Complete=1

Three example URLs to test with:

https://www.ebay.com/sch/i.html?_nkw=jordan+1+chicago&LH_Sold=1&LH_Complete=1
https://www.ebay.com/sch/i.html?_nkw=nikon+d750+used&LH_Sold=1&LH_Complete=1
https://www.ebay.com/sch/i.html?_nkw=vintage+carhartt+jacket&LH_Sold=1&LH_Complete=1

Useful extra filters that compose cleanly onto the URL:

Filter	URL fragment	Effect
Condition: used	`LH_ItemCondition=3000`	Drops new/refurbished
Buy It Now only	`LH_BIN=1`	Excludes auctions for cleaner medians
Auction only	`LH_Auction=1`	Auctions only (more dispersion)
Price range	`_udlo=50&_udhi=300`	Lower / upper bound
US sellers only	`LH_PrefLoc=1`	Cuts cross-border noise
Category ID	`_sacat=15724`	Narrows to one eBay category

The Buy It Now filter (LH_BIN=1) is the highest-leverage one for resellers — it gives you a tighter median because auction prices are noisier (one excited bidder pumps the comp), and BIN prices are what your own listings will compete against if you also sell at fixed price.

The API Call

Every LogPose eBay endpoint is asynchronous — submit a job, poll for status, fetch the result. Submit with curl first to confirm your sold-filter URL works:

curl -G "https://api.logposervices.com/api/v1/ecommerce/ebay/search" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "url=https://www.ebay.com/sch/i.html?_nkw=jordan+1+chicago&LH_Sold=1&LH_Complete=1" \
  --data-urlencode "pages=4"
# → {"job_id": "eb_5a2c..."}

curl -H "X-API-Key: lp_xxxxxxx" \
  "https://api.logposervices.com/api/v1/jobs/eb_5a2c?wait=true&timeout=60"

curl -H "X-API-Key: lp_xxxxxxx" \
  https://api.logposervices.com/api/v1/jobs/eb_5a2c/result

eBay returns about 60 listings per page on the sold-filter view, so pages=4 is roughly 240 sold comps from one keyword. Most 4-page jobs finish in 45–75 seconds.

The Python Pricing Pipeline

This is the script most resellers end up running before listing inventory. It takes one keyword, pulls the last 90 days of sold comps, and prints a price-distribution summary plus a recommended listing range.

import os, time, statistics, requests

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}


def submit_and_wait(path: str, params: dict, timeout_s: int = 120) -> dict:
    r = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30)
    r.raise_for_status()
    job_id = r.json()["job_id"]
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
        if s["status"] == "completed":
            break
        if s["status"] == "failed":
            raise RuntimeError(s.get("error", "unknown failure"))
        time.sleep(2)
    else:
        raise TimeoutError(f"job {job_id} did not finish in {timeout_s}s")
    return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()


def price_comps(keyword: str, pages: int = 4, bin_only: bool = True) -> dict:
    url = (
        "https://www.ebay.com/sch/i.html"
        f"?_nkw={keyword.replace(' ', '+')}"
        "&LH_Sold=1&LH_Complete=1"
        + ("&LH_BIN=1" if bin_only else "")
    )
    data = submit_and_wait("ecommerce/ebay/search", {"url": url, "pages": pages})
    items = data["listings"]
    prices = [float(i["price"]) for i in items if i.get("price")]
    if not prices:
        return {"keyword": keyword, "n": 0}
    return {
        "keyword": keyword,
        "n": len(prices),
        "min": round(min(prices), 2),
        "p25": round(statistics.quantiles(prices, n=4)[0], 2),
        "median": round(statistics.median(prices), 2),
        "p75": round(statistics.quantiles(prices, n=4)[2], 2),
        "max": round(max(prices), 2),
        "items": items,
    }


if __name__ == "__main__":
    r = price_comps("jordan 1 chicago size 10", pages=4)
    print(f"{r['keyword']}: n={r['n']}  median=${r['median']}  range=${r['p25']}–${r['p75']}")
    # → jordan 1 chicago size 10: n=187  median=$318.00  range=$285.00–$362.50

Run that against a thrift find and the median tells you what the item is actually worth; the p25–p75 band tells you the listing-price range that will move the inventory in 2–4 weeks. List below p25 for fast turn, above the median for patience plays, never above p75 unless the item is meaningfully better than the average comp.

Cleaning the Comps for a Defensible Median

Raw output from one 4-page sold-listing job is usually 200–240 rows. Four cleaning steps make the median actually usable.

import pandas as pd

df = pd.DataFrame(items)
df["price"] = pd.to_numeric(df["price"], errors="coerce")
df["sold_date"] = pd.to_datetime(df["sold_date"], errors="coerce")

# 1. Drop rows with no price (parsing failures, "or Best Offer" with no sale)
df = df.dropna(subset=["price"])

# 2. Drop the long tail of accessory/bundle/lot listings that pollute keyword
#    searches — these are usually 3x median or 0.2x median, never the real item
lo, hi = df["price"].quantile(0.05), df["price"].quantile(0.95)
df = df[(df["price"] >= lo) & (df["price"] <= hi)]

# 3. Restrict to last 60 days for a more current median than the default 90
cutoff = pd.Timestamp.utcnow().tz_localize(None) - pd.Timedelta(days=60)
df = df[df["sold_date"] >= cutoff]

# 4. Weight by recency — last 14 days count double in the median calc
recent = df[df["sold_date"] >= pd.Timestamp.utcnow().tz_localize(None) - pd.Timedelta(days=14)]
weighted = pd.concat([df, recent])  # recent rows appear twice

print(f"n={len(df)}  median=${weighted['price'].median():.2f}")

The 5%/95% trim is the single highest-leverage step. eBay's keyword matching is loose — a search for "jordan 1 chicago" pulls in toddler sizes ($40), authenticated GS pairs ($280), authenticated mens ($340), the full 2015 set with extras ($800), and pairs sold without the box ($210). Trimming the outer 10% removes the noise and leaves you with a tight, defensible median.

Scaling Beyond One Keyword

One keyword gives you one comps table. To price a full inbound lot — say, a $400 estate-sale box of fifty items — you need fifty keywords run in parallel. Two patterns work in production.

Sequential script. Loop over your inventory list, call price_comps for each row, write the result to a Google Sheet or a CSV your phone can read while you sort through boxes. A 50-item run takes about 30 minutes sequentially and gives you a full sortable comps table.

Bulk submission. Submit the whole list in one bulk request and let the LogPose platform schedule them across the proxy pool in parallel:

import os, requests

inventory_keywords = [
    "jordan 1 chicago size 10",
    "nikon d750 used",
    "vintage carhartt jacket xl",
    "coach willis 9927",
    "ipad mini 5 64gb wifi",
    # ... 45 more rows from the inbound lot
]

targets = [
    {
        "url": (
            "https://www.ebay.com/sch/i.html"
            f"?_nkw={kw.replace(' ', '+')}"
            "&LH_Sold=1&LH_Complete=1&LH_BIN=1"
        ),
        "pages": 4,
    }
    for kw in inventory_keywords
]

requests.post(
    "https://api.logposervices.com/api/v1/ecommerce/ebay/search/bulk",
    headers={"X-API-Key": os.environ["LOGPOSE_API_KEY"]},
    json={"targets": targets},
).raise_for_status()

Bulk runs in parallel up to your concurrency cap, which cuts a 50-keyword inventory price-check from 30 minutes sequential to 4–6 minutes wall-clock. For a reseller who buys lots and needs to triage what to keep versus what to resell in bulk, this is the workflow.

For weekly trend tracking on a saved set of keywords, the LogPose tracker system can re-run the same sold-search on a schedule and alert you when a median moves more than a configurable percentage — useful for catching demand spikes early on hyped categories like new sneaker releases or trend-driven vintage cycles.

Legality and Ethics

eBay sold-listing data is public and indexed by Google. Scraping it for internal pricing decisions sits on the same settled legal ground as scraping any other public marketplace data in the US (CFAA does not apply to public data per hiQ v. LinkedIn) and is broadly compliant in the EU under GDPR's legitimate-interest basis for non-personal commercial data — sold listings do not surface buyer identities, and seller usernames are pseudonymous. eBay's User Agreement restricts redistributing the data as a competing product; it does not restrict you, the reseller, from looking up comps to price your own inventory. The scrape is the safe step; rebuilding eBay's sold-listing UI on your own site for the public to query would be the unsafe one.

Common Mistakes

Scraping active listings and calling it pricing data. This is the single most common mistake. LH_Sold=1&LH_Complete=1 is non-negotiable; without it you are looking at asking prices, which are 30–50% inflated on used goods.
Including auction comps in a Buy It Now pricing decision. Auctions are noisy (one excited bidder pumps the comp), and BIN buyers will not pay the auction peak. Always add LH_BIN=1 when pricing for BIN listings.
Trusting the median on a tiny sample. A keyword that returns 6 sold comps in 90 days has a noisy median; the inter-quartile range will be wider than the median itself. For low-volume items, widen the date window, broaden the keyword, or accept that the pricing decision is more art than data.
Ignoring the long tail of bundles and lots. A "jordan 1 chicago" search returns 1-pair listings, 2-pair lots, full-set listings with apparel, and accessory-only listings (laces, boxes, dust bags). Always trim the 5%/95% tails before computing a median.
Ignoring the Cloudflare 100-second edge timeout. api.logposervices.com sits behind Cloudflare, so a job that takes 100+ seconds returns a 524 to your client even though the job continues server-side. Always poll for status; never expect a synchronous response on a big page count.

Get Started

Sign up at logposervices.com and generate an API key under Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx
Build a sold-filter URL for one item you bought recently and run the price_comps function above against it.

Related reading: How to scrape Amazon search results for the new-goods companion workflow (Amazon new prices + eBay sold prices is the canonical arbitrage pair), How to set up competitor price monitoring for the recurring-refresh pattern, and the Apify alternative for ecommerce scraping comparison for the broader managed-API trade-offs.

External: eBay advanced search, Terapeak research, hiQ Labs v. LinkedIn.

How to Scrape eBay Sold Listings for Real Sale Prices

Why Sold Listings Beat Every Other Pricing Source

What an eBay Sold Listing Returns

Building the Sold-Filter URL

The API Call

The Python Pricing Pipeline

Cleaning the Comps for a Defensible Median

Scaling Beyond One Keyword

Legality and Ethics

Common Mistakes

Get Started

Frequently asked questions

Related posts

How to Scrape eBay Sold Listings for Real Sale Prices

Frequently asked questions

Related posts

How a Cold-Email Agency Pulls 500 Fresh Local Leads a Week

The Deal Scout's Weekly Funding Digest from Crunchbase

How DTC Brands Catch a Competitor's Price Drop the Same Day