← Back to blogTutorial

How to Scrape Yellow Pages Emails for Cold Outreach

· 10 min read

If you have ever searched for a "Yellow Pages email scraper," you have probably hit the same wall everyone does: Yellow Pages does not actually list email addresses. A YP listing is a name, a phone, an address, opening hours, and — if you are lucky — a link to the business's own website. There is no email field to scrape. So when a tool promises "Yellow Pages emails," what it is really doing is the two-step move this post is about: scrape the directory to get the businesses, then visit each business's website and read the email off the contact page, the footer, or a mailto: link.

This is a code-forward tutorial for SDRs and agency operators who want an outreach-ready list out the other end. We will pick a category and a city, build the YP search URL, submit it to an endpoint that does the directory scrape and the website enrichment in one pass, poll for the result, then flatten everything into a clean CSV with deduped emails. The honest caveat up front: not every business has a findable email, so plan for a partial hit rate rather than one email per row.

Why Yellow Pages Has No Emails (And Where They Come From)

It helps to be precise about the data model, because it determines the whole pipeline.

A Yellow Pages search result is directory metadata: business name, listing URL, primary phone, a parsed address (street, city, state, postal code, country), and opening hours. That is reliable and it is public. What it is missing for outbound is the thing you actually want to send to — an email address. YP simply does not carry that field for the vast majority of listings.

The email lives somewhere else: on the business's own website. A plumber's site has info@acmeplumbing.com in the footer; a marketing agency has a contact form backed by a hello@ address; a roofer lists a Gmail in a mailto: link. To get those, something has to (1) find the business's website, and (2) fetch that site and extract the contact details. That second step is website-derived enrichment, and it is the difference between a phone list and an outreach list.

So the real pipeline is always two layers:

  1. Directory layer — scrape Yellow Pages for the businesses in a category + city.
  2. Enrichment layer — for each business, find its website and pull emails, extra phones, and social profiles off it.

You can do these as two separate jobs (scrape YP, then run a website-enrichment pass yourself), or use a single endpoint that chains them. This tutorial uses the chained endpoint so the code stays short, but the mental model is the same either way.

The Two Yellow Pages Endpoints (Know the Difference)

There are two relevant routes, and picking the wrong one is the most common mistake.

GET /api/v1/ecommerce/yellowpages/search returns the directory rows without enrichment — name, phone, address, hours. Fast, but every row has an empty email. This is the right call when you only need phones and addresses (dialer lists, direct mail) and do not want to spend time on website fetches.

GET /api/v1/ecommerce/yellowpages/leads returns the same directory rows, but each one is enriched with emails, phones, and socials pulled from the business's website. This is the route for cold email. It costs more time per page because it is doing the website fetches on your behalf, but it is the difference between a list you can dial and a list you can email.

For the rest of this post we use /leads. If you find yourself with a list full of empty emails arrays, the first thing to check is that you called /leads and not /search.

Step 1 — Pick a Category and Build the Search URL

The whole job is keyed off one URL. Go to yellowpages.com, search your target category in your target city, and copy the URL from the address bar. It looks like this:

https://www.yellowpages.com/search?search_terms=Roofing&geo_location_terms=Austin%2C+TX

Two parameters matter: search_terms (the category) and geo_location_terms (the city, with the comma URL-encoded as %2C). Tune the search in the browser — category specificity, city — until the result count looks right, then copy the URL exactly, encoded comma and all. If you hand-decode that %2C, the query breaks.

A practical tip on category choice for cold outreach: narrower categories enrich better. "Roofing" in a metro will pull a mix of national franchises (contact forms, no scrapable email) and independents (a real mailto: in the footer). The independents are where the findable emails are, and where cold email actually lands, so a tighter search term often gives you a higher usable-email rate even though it returns fewer rows.

Step 2 — Submit the /leads Job with a Page Count

Yellow Pages shows roughly 30 businesses per page, so pages is your volume dial: pages=5 is about 150 businesses, pages=10 about 300. The job is asynchronous — you submit, get a job_id back, and poll. Here is the bare curl submit:

curl -G "https://api.logposervices.com/api/v1/ecommerce/yellowpages/leads" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "url=https://www.yellowpages.com/search?search_terms=Roofing&geo_location_terms=Austin%2C+TX" \
  --data-urlencode "pages=5" \
  --data-urlencode "start_page=1"
# → {"job_id": "yp_8f3a...", "status": "pending"}

The start_page parameter defaults to 1; pass it explicitly when you want to resume a wider pull from a later page without re-scraping the early ones.

Why async matters here: api.logposervices.com sits behind Cloudflare, which enforces a ~90-second edge timeout on any single HTTP request. An enrichment job that visits dozens of websites will run well past that, so the endpoint hands you a job_id immediately and does the work in the background. Do not try to hold the connection open waiting for results — poll instead.

Step 3 — Poll for Completion and Fetch the Result

Once you have a job_id, poll GET /api/v1/jobs/{job_id} until status is completed, then fetch GET /api/v1/jobs/{job_id}/result:

# Poll status
curl -H "X-API-Key: lp_xxxxxxx" \
  "https://api.logposervices.com/api/v1/jobs/yp_8f3a"
# → {"status": "running"}  ... then eventually {"status": "completed"}

# Fetch the enriched result
curl -H "X-API-Key: lp_xxxxxxx" \
  "https://api.logposervices.com/api/v1/jobs/yp_8f3a/result"

A five-page enrichment job takes meaningfully longer than a plain search because of the per-website fetches — budget a few minutes, not a few seconds, and poll on an interval rather than hammering the status route.

Step 4 — Understand the Enriched Fields and the Hit Rate

Each row in the result is a directory listing with an enrichment block attached. The shape looks roughly like this:

{
  "name": "Lone Star Roofing Co",
  "phone": "(512) 555-0142",
  "address": "1234 Burnet Rd, Austin, TX 78757",
  "website": "https://lonestarroofingco.com",
  "opening_hours": ["Mon-Fri 8:00-17:00"],
  "emails": ["info@lonestarroofingco.com"],
  "phones": ["(512) 555-0142", "(512) 555-9911"],
  "socials": {
    "facebook": "https://facebook.com/lonestarroofingatx",
    "instagram": "https://instagram.com/lonestarroofing"
  }
}

The enrichment is two-wave, and it is worth understanding so you read the hit rate correctly:

  • Wave 1 extracts contacts from the business's own website — the one the listing links to or that the enricher resolves for it. This is the high-quality wave: an email on the company's own domain (info@lonestarroofingco.com) is the kind you actually want to email.
  • Wave 2 is a fallback for rows that have no usable website — no link on the listing, a dead domain, or a page with nothing extractable. It does a lookup to try to attach something, but these are lower-confidence and more likely to be a generic or third-party address.

Be honest with yourself about the hit rate. Not every business has a findable email. A real-world /leads pull on a local-services category typically lands a usable email on something like a third to two-thirds of rows, depending on the category — independents with their own site enrich well; franchises and contact-form-only sites do not. That is normal and expected. The phones and addresses are near-complete; the emails are the partial field. Plan your list size around the email yield, not the row count: if you need 200 emailable leads at a 40% hit rate, scrape ~500 businesses (≈17 pages), not 200.

Step 5 — Flatten, Dedupe, and Clean in Python

Now the code. This pipeline submits the job, polls to completion, fetches the result, and flattens the nested enrichment into a flat, outreach-ready CSV — one row per email, deduped, with role-vs-personal classified and format-validated.

import os, re, csv, time, requests

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}

SEARCH_URL = (
    "https://www.yellowpages.com/search"
    "?search_terms=Roofing&geo_location_terms=Austin%2C+TX"
)

EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[a-z]{2,}
quot;, re.I) ROLE_PREFIXES = { "info", "contact", "sales", "hello", "support", "admin", "office", "team", "service", "help", "enquiries", "inquiries", } def submit(url: str, pages: int = 5, start_page: int = 1) -> str: r = requests.get( f"{BASE}/ecommerce/yellowpages/leads", params={"url": url, "pages": pages, "start_page": start_page}, headers=HEADERS, timeout=30, ) r.raise_for_status() return r.json()["job_id"] def wait_for(job_id: str, timeout_s: int = 600) -> dict: deadline = time.time() + timeout_s while time.time() < deadline: s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json() if s["status"] == "completed": return requests.get( f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=30 ).json() if s["status"] == "failed": raise RuntimeError(s.get("error", "job failed")) time.sleep(5) raise TimeoutError(f"{job_id} did not finish in {timeout_s}s") def is_role(email: str) -> bool: return email.split("@", 1)[0].lower() in ROLE_PREFIXES def flatten(rows: list[dict]) -> list[dict]: out, seen = [], set() for row in rows: socials = row.get("socials") or {} for email in row.get("emails") or []: email = email.strip().lower() if not EMAIL_RE.match(email): # drop malformed addresses continue if email in seen: # dedupe across the whole list continue seen.add(email) out.append({ "name": row.get("name", ""), "email": email, "email_type": "role" if is_role(email) else "personal", "phone": (row.get("phones") or [row.get("phone")])[0] or "", "website": row.get("website", ""), "city": row.get("city", ""), "state": row.get("state", ""), "facebook": socials.get("facebook", ""), "instagram": socials.get("instagram", ""), "linkedin": socials.get("linkedin", ""), }) return out if __name__ == "__main__": job = submit(SEARCH_URL, pages=5) result = wait_for(job) rows = result.get("listings") or result.get("results") or [] leads = flatten(rows) print(f"{len(rows)} businesses scraped → {len(leads)} unique emails") with open("austin_roofing_outreach.csv", "w", newline="") as f: w = csv.DictWriter(f, fieldnames=leads[0].keys()) w.writeheader() w.writerows(leads)

A few notes on the cleaning choices, because they are the difference between a list that sends and one that gets you blocklisted:

  • Format validation. The regex drops anything that is not a plausible address — enrichment occasionally picks up image filenames or truncated strings that look email-ish. Cheap insurance against bounces.
  • Global dedupe. A business cross-listed under multiple sub-categories appears more than once, and a shared address (info@) can show up across sibling locations. Deduping on the email itself collapses both.
  • Role vs. personal. info@, sales@, and hello@ are role addresses; john@ is a person. They are not interchangeable for outreach — role addresses are fine for a "who handles X?" opener but convert worse than a named inbox. Tagging them lets you segment instead of blasting one template at both.

If you want a stricter list, filter to email_type == "personal" for your high-touch sequence and route the role addresses to a lighter-weight one.

Scaling Across Categories and Cities with LogPose

The single-search flow caps out around 3,000 listings per category-city pair before Yellow Pages runs out of pagination. For wider coverage you fan the same /leads call across a grid of (category, city) pairs — each one is an independent async job, so submit them all, collect the job_ids, then poll them as a batch:

TARGETS = [
    ("Roofing", "Austin, TX"),
    ("Roofing", "San Antonio, TX"),
    ("Roofing", "Dallas, TX"),
    ("Gutter Cleaning", "Austin, TX"),
]

def yp_url(term, geo):
    from urllib.parse import quote_plus
    return (f"https://www.yellowpages.com/search"
            f"?search_terms={quote_plus(term)}&geo_location_terms={quote_plus(geo)}")

jobs = [submit(yp_url(term, geo), pages=10) for term, geo in TARGETS]
all_rows = []
for jid in jobs:
    all_rows.extend(wait_for(jid).get("listings") or [])

leads = flatten(all_rows)   # one dedupe pass across every city at once

Because the dedupe runs once over the merged set, an email that appears in both the Austin and San Antonio pulls (a regional chain) lands in your file exactly once. LogPose handles the per-page proxying, the website fetches, and the wave-1/wave-2 enrichment server-side, so your code stays the submit-poll-flatten shape no matter how many (category, city) pairs you throw at it. For a recurring program — refresh the grid weekly, email only the net-new businesses — pair this with a saved monitor so you are not re-emailing last week's list; see the companion post linked below.

A Responsible-Sending Note

The scrape is the easy part; sending responsibly is the work. Keep it simple and you stay clear of trouble:

  • Warm the domain and throttle. A brand-new sending domain blasting 500 cold emails on day one is a deliverability and reputation disaster. Use a dedicated outbound domain, ramp volume over days, and keep daily sends modest.
  • CAN-SPAM basics (US). Every message needs accurate from/subject headers, a working one-click unsubscribe, and a real physical mailing address in the footer. These are not optional.
  • GDPR for EU/UK recipients. If you are emailing people in the EU or UK, you need a lawful basis. For B2B that is usually legitimate interest, which means relevance to the recipient's role and an easy opt-out — not a blanket blast.
  • Honor opt-outs immediately and suppress bounced addresses so you are not re-hitting dead inboxes.

The hiQ v. LinkedIn ruling (9th Cir. 2022) confirms that collecting public data like a YP directory is broadly defensible; it says nothing about what you send afterward. None of the above is legal advice — for a high-volume program, a one-hour consult with a marketing lawyer is the cheapest insurance you will buy.

The Honest LogPose Fit

The /leads endpoint is a good fit when your shape is "give me businesses in a category + city with whatever email I can realistically attach, in one call, as clean JSON." It chains the directory scrape and the two-wave website enrichment so you are not maintaining a separate crawler, and the async job pattern is identical across the other LogPose endpoints, so your submit-poll-flatten code does not change when you add another platform to the pipeline.

The honest constraints are the ones inherent to the problem, not the tool. Emails are a partial field — wave 1 only finds an address when the business has a real, fetchable website, and wave 2 fallbacks are lower-confidence by nature, so a fraction of your rows will have no email no matter what. And enrichment is slower than a plain search because it is visiting real websites. If you only need phones and addresses for a dialer, use /search and skip all of this. If you genuinely need email-per-row at near-100% coverage for a specific niche, no public-directory approach will get you there — that is a paid data-broker problem.

Get Started

  1. Sign up at logposervices.com and generate an API key under Tool → API Keys.
  2. Build a Yellow Pages search URL for your category + city (search_terms + geo_location_terms, comma encoded as %2C).
  3. Call /api/v1/ecommerce/yellowpages/leads?url=...&pages=5, poll the returned job_id, fetch the result, and run the flatten-and-dedupe script above to get an outreach-ready CSV.

Related reading: How to build a B2B lead list from Yellow Pages (no code) for the dashboard-first version, How to enrich business leads with emails, phones, and socials for the enrichment model in depth, and How to monitor Yellow Pages for new businesses for turning a one-off scrape into a weekly net-new feed.

External: Yellow Pages search, CAN-SPAM Act compliance guide, ICO guidance on B2B direct marketing.

Frequently asked questions

Does Yellow Pages list business email addresses?
No — and this trips up most people who go looking for a 'Yellow Pages email scraper'. A Yellow Pages listing gives you a name, phone, address, hours, and sometimes a website link, but the directory itself almost never publishes an email address. Every email you can realistically attach to a YP business is derived after the fact by visiting that business's own website and reading the contact page, mailto links, or footer. That two-step shape — scrape the directory, then enrich each row from its website — is the whole job, and it is why a raw YP export has zero emails in it until you do the enrichment pass.
Is scraping Yellow Pages for cold outreach legal?
Scraping the public Yellow Pages directory is generally lawful in the US — the data is published openly, and the Ninth Circuit's hiQ v. LinkedIn decision (2022) reaffirmed that scraping publicly accessible web data does not by itself violate the Computer Fraud and Abuse Act. The regulated part is not the collection, it is the outreach. Cold email carries real obligations: CAN-SPAM in the US (accurate headers, a working unsubscribe, a physical mailing address) and, when you contact people in the EU or UK, GDPR's lawful-basis and legitimate-interest requirements. None of this is legal advice — treat the scrape as the easy part and the sending as the compliance work, and get a short consult with a marketing lawyer before a high-volume program.

Related posts

Tutorial

How to Build a VC Deal-Flow List from Crunchbase

10 min read
Comparison

Crunchbase API Alternatives for Funding and Investor Data

10 min read
Comparison

PhantomBuster Alternatives for B2B Prospecting Pipelines

10 min read