How to Scrape eBay Sold Listings for Real Sale Prices
If you flip thrifted goods, estate-sale lots, sneakers, vintage clothing, used electronics, books, or any other secondhand inventory, your single hardest problem is pricing. You just paid four dollars for a vintage Carhartt jacket at a Goodwill, or fifty for a box of old camera lenses at an estate sale, or three hundred for a pallet of returns at a liquidator. What is any of it actually worth? Not the optimistic number some seller is asking on the active feed — the number a buyer actually paid yesterday. That dataset exists, it is public, and it is called eBay sold listings. This guide walks the full pipeline: building the sold-filter URL, running the scrape, cleaning the output into a comps table, and turning a weekly refresh into a real pricing system.
Why Sold Listings Beat Every Other Pricing Source
The honest field comparison looks like this. Sold listings on eBay are the closest thing the used-goods market has to a real-time price index. Active listings on eBay, Mercari, Poshmark, and Depop are all asking prices — what sellers want, not what buyers paid, and the gap between the two is routinely 30–50% on used goods. Worthpoint and similar paid databases aggregate eBay sold data with a delay and behind a paywall. Discord groups and Reddit threads share anecdotes that are biased toward home runs, not medians. Specialized platforms like StockX and GOAT are clean for the categories they cover (sneakers, watches, handbags) but cover nothing else.
The other reason sold listings win is volume. For any keyword with more than a handful of weekly transactions, eBay's sold archive is a statistically meaningful sample. A median across 50 sold comps from the last 90 days is a defensible price; a median across 3 anecdotes from a Facebook group is not.
What an eBay Sold Listing Returns
Per item, a sold-search result returns:
| Field | Example |
|---|---|
title | Air Jordan 1 Retro High OG Chicago Lost & Found 2022 Size 10 |
price | 312.50 |
currency | USD |
sold_date | 2026-05-21 |
condition | Pre-owned |
seller | sneakerkid_42 |
seller_feedback | 1842 |
item_url | https://www.ebay.com/itm/2856xxxxxx |
item_id | 2856xxxxxx |
image_url | https://i.ebayimg.com/images/g/... |
shipping | Free shipping |
bids | 14 (auction) or null (BIN) |
format | Auction / Buy It Now / Best Offer |
What it does not return: the buyer's identity, the final shipping cost broken out, the exact best-offer accepted price when the listing was Best Offer with a hidden accepted figure (eBay obscures this — the displayed price is the accepted offer, but with no provenance), or the item description body. For pricing decisions, the fields above are what you need; price + condition + date is the core comp signal and everything else is metadata.
Building the Sold-Filter URL
This is the critical detail that makes eBay sold-listing scraping work: the sold filter is fully URL-encoded, and you build it once and then template across keywords. The two query parameters that flip a regular eBay search into a sold-listing search are:
LH_Sold=1— only show items that soldLH_Complete=1— only show completed (ended) listings
Combine those with the standard keyword parameter _nkw and you have the full URL shape:
https://www.ebay.com/sch/i.html?_nkw=<keyword>&LH_Sold=1&LH_Complete=1
Three example URLs to test with:
https://www.ebay.com/sch/i.html?_nkw=jordan+1+chicago&LH_Sold=1&LH_Complete=1
https://www.ebay.com/sch/i.html?_nkw=nikon+d750+used&LH_Sold=1&LH_Complete=1
https://www.ebay.com/sch/i.html?_nkw=vintage+carhartt+jacket&LH_Sold=1&LH_Complete=1
Useful extra filters that compose cleanly onto the URL:
| Filter | URL fragment | Effect |
|---|---|---|
| Condition: used | LH_ItemCondition=3000 | Drops new/refurbished |
| Buy It Now only | LH_BIN=1 | Excludes auctions for cleaner medians |
| Auction only | LH_Auction=1 | Auctions only (more dispersion) |
| Price range | _udlo=50&_udhi=300 | Lower / upper bound |
| US sellers only | LH_PrefLoc=1 | Cuts cross-border noise |
| Category ID | _sacat=15724 | Narrows to one eBay category |
The Buy It Now filter (LH_BIN=1) is the highest-leverage one for resellers — it gives you a tighter median because auction prices are noisier (one excited bidder pumps the comp), and BIN prices are what your own listings will compete against if you also sell at fixed price.
The API Call
Every LogPose eBay endpoint is asynchronous — submit a job, poll for status, fetch the result. Submit with curl first to confirm your sold-filter URL works:
curl -G "https://api.logposervices.com/api/v1/ecommerce/ebay/search" \
-H "X-API-Key: lp_xxxxxxx" \
--data-urlencode "url=https://www.ebay.com/sch/i.html?_nkw=jordan+1+chicago&LH_Sold=1&LH_Complete=1" \
--data-urlencode "pages=4"
# → {"job_id": "eb_5a2c..."}
curl -H "X-API-Key: lp_xxxxxxx" \
"https://api.logposervices.com/api/v1/jobs/eb_5a2c?wait=true&timeout=60"
curl -H "X-API-Key: lp_xxxxxxx" \
https://api.logposervices.com/api/v1/jobs/eb_5a2c/result
eBay returns about 60 listings per page on the sold-filter view, so pages=4 is roughly 240 sold comps from one keyword. Most 4-page jobs finish in 45–75 seconds.
The Python Pricing Pipeline
This is the script most resellers end up running before listing inventory. It takes one keyword, pulls the last 90 days of sold comps, and prints a price-distribution summary plus a recommended listing range.
import os, time, statistics, requests
API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
def submit_and_wait(path: str, params: dict, timeout_s: int = 120) -> dict:
r = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30)
r.raise_for_status()
job_id = r.json()["job_id"]
deadline = time.time() + timeout_s
while time.time() < deadline:
s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
if s["status"] == "completed":
break
if s["status"] == "failed":
raise RuntimeError(s.get("error", "unknown failure"))
time.sleep(2)
else:
raise TimeoutError(f"job {job_id} did not finish in {timeout_s}s")
return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()
def price_comps(keyword: str, pages: int = 4, bin_only: bool = True) -> dict:
url = (
"https://www.ebay.com/sch/i.html"
f"?_nkw={keyword.replace(' ', '+')}"
"&LH_Sold=1&LH_Complete=1"
+ ("&LH_BIN=1" if bin_only else "")
)
data = submit_and_wait("ecommerce/ebay/search", {"url": url, "pages": pages})
items = data["listings"]
prices = [float(i["price"]) for i in items if i.get("price")]
if not prices:
return {"keyword": keyword, "n": 0}
return {
"keyword": keyword,
"n": len(prices),
"min": round(min(prices), 2),
"p25": round(statistics.quantiles(prices, n=4)[0], 2),
"median": round(statistics.median(prices), 2),
"p75": round(statistics.quantiles(prices, n=4)[2], 2),
"max": round(max(prices), 2),
"items": items,
}
if __name__ == "__main__":
r = price_comps("jordan 1 chicago size 10", pages=4)
print(f"{r['keyword']}: n={r['n']} median=${r['median']} range=${r['p25']}–${r['p75']}")
# → jordan 1 chicago size 10: n=187 median=$318.00 range=$285.00–$362.50
Run that against a thrift find and the median tells you what the item is actually worth; the p25–p75 band tells you the listing-price range that will move the inventory in 2–4 weeks. List below p25 for fast turn, above the median for patience plays, never above p75 unless the item is meaningfully better than the average comp.
Cleaning the Comps for a Defensible Median
Raw output from one 4-page sold-listing job is usually 200–240 rows. Four cleaning steps make the median actually usable.
import pandas as pd
df = pd.DataFrame(items)
df["price"] = pd.to_numeric(df["price"], errors="coerce")
df["sold_date"] = pd.to_datetime(df["sold_date"], errors="coerce")
# 1. Drop rows with no price (parsing failures, "or Best Offer" with no sale)
df = df.dropna(subset=["price"])
# 2. Drop the long tail of accessory/bundle/lot listings that pollute keyword
# searches — these are usually 3x median or 0.2x median, never the real item
lo, hi = df["price"].quantile(0.05), df["price"].quantile(0.95)
df = df[(df["price"] >= lo) & (df["price"] <= hi)]
# 3. Restrict to last 60 days for a more current median than the default 90
cutoff = pd.Timestamp.utcnow().tz_localize(None) - pd.Timedelta(days=60)
df = df[df["sold_date"] >= cutoff]
# 4. Weight by recency — last 14 days count double in the median calc
recent = df[df["sold_date"] >= pd.Timestamp.utcnow().tz_localize(None) - pd.Timedelta(days=14)]
weighted = pd.concat([df, recent]) # recent rows appear twice
print(f"n={len(df)} median=${weighted['price'].median():.2f}")
The 5%/95% trim is the single highest-leverage step. eBay's keyword matching is loose — a search for "jordan 1 chicago" pulls in toddler sizes ($40), authenticated GS pairs ($280), authenticated mens ($340), the full 2015 set with extras ($800), and pairs sold without the box ($210). Trimming the outer 10% removes the noise and leaves you with a tight, defensible median.
Scaling Beyond One Keyword
One keyword gives you one comps table. To price a full inbound lot — say, a $400 estate-sale box of fifty items — you need fifty keywords run in parallel. Two patterns work in production.
Sequential script. Loop over your inventory list, call price_comps for each row, write the result to a Google Sheet or a CSV your phone can read while you sort through boxes. A 50-item run takes about 30 minutes sequentially and gives you a full sortable comps table.
Bulk submission. Submit the whole list in one bulk request and let the LogPose platform schedule them across the proxy pool in parallel:
import os, requests
inventory_keywords = [
"jordan 1 chicago size 10",
"nikon d750 used",
"vintage carhartt jacket xl",
"coach willis 9927",
"ipad mini 5 64gb wifi",
# ... 45 more rows from the inbound lot
]
targets = [
{
"url": (
"https://www.ebay.com/sch/i.html"
f"?_nkw={kw.replace(' ', '+')}"
"&LH_Sold=1&LH_Complete=1&LH_BIN=1"
),
"pages": 4,
}
for kw in inventory_keywords
]
requests.post(
"https://api.logposervices.com/api/v1/ecommerce/ebay/search/bulk",
headers={"X-API-Key": os.environ["LOGPOSE_API_KEY"]},
json={"targets": targets},
).raise_for_status()
Bulk runs in parallel up to your concurrency cap, which cuts a 50-keyword inventory price-check from 30 minutes sequential to 4–6 minutes wall-clock. For a reseller who buys lots and needs to triage what to keep versus what to resell in bulk, this is the workflow.
For weekly trend tracking on a saved set of keywords, the LogPose tracker system can re-run the same sold-search on a schedule and alert you when a median moves more than a configurable percentage — useful for catching demand spikes early on hyped categories like new sneaker releases or trend-driven vintage cycles.
Legality and Ethics
eBay sold-listing data is public and indexed by Google. Scraping it for internal pricing decisions sits on the same settled legal ground as scraping any other public marketplace data in the US (CFAA does not apply to public data per hiQ v. LinkedIn) and is broadly compliant in the EU under GDPR's legitimate-interest basis for non-personal commercial data — sold listings do not surface buyer identities, and seller usernames are pseudonymous. eBay's User Agreement restricts redistributing the data as a competing product; it does not restrict you, the reseller, from looking up comps to price your own inventory. The scrape is the safe step; rebuilding eBay's sold-listing UI on your own site for the public to query would be the unsafe one.
Common Mistakes
- Scraping active listings and calling it pricing data. This is the single most common mistake.
LH_Sold=1&LH_Complete=1is non-negotiable; without it you are looking at asking prices, which are 30–50% inflated on used goods. - Including auction comps in a Buy It Now pricing decision. Auctions are noisy (one excited bidder pumps the comp), and BIN buyers will not pay the auction peak. Always add
LH_BIN=1when pricing for BIN listings. - Trusting the median on a tiny sample. A keyword that returns 6 sold comps in 90 days has a noisy median; the inter-quartile range will be wider than the median itself. For low-volume items, widen the date window, broaden the keyword, or accept that the pricing decision is more art than data.
- Ignoring the long tail of bundles and lots. A "jordan 1 chicago" search returns 1-pair listings, 2-pair lots, full-set listings with apparel, and accessory-only listings (laces, boxes, dust bags). Always trim the 5%/95% tails before computing a median.
- Ignoring the Cloudflare 100-second edge timeout.
api.logposervices.comsits behind Cloudflare, so a job that takes 100+ seconds returns a 524 to your client even though the job continues server-side. Always poll for status; never expect a synchronous response on a big page count.
Get Started
- Sign up at logposervices.com and generate an API key under Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx- Build a sold-filter URL for one item you bought recently and run the
price_compsfunction above against it.
Related reading: How to scrape Amazon search results for the new-goods companion workflow (Amazon new prices + eBay sold prices is the canonical arbitrage pair), How to set up competitor price monitoring for the recurring-refresh pattern, and the Apify alternative for ecommerce scraping comparison for the broader managed-API trade-offs.
External: eBay advanced search, Terapeak research, hiQ Labs v. LinkedIn.