← Back to blogTutorial

How to Monitor Yellow Pages for New Businesses in Your Category

· 9 min read

For agency owners and local-services sales teams, the most valuable lead in any category is the business that opened last week. They are buying their first vendors right now, have no incumbent relationships, and are in active research mode. The problem: Yellow Pages does not publish a "new businesses" feed, so there is no obvious way to find them other than re-running the same search every day and comparing the results. This guide walks the full diff-loop pattern — Python script, alerting, scheduling, and the operational details that separate a working monitor from one that floods Slack with false positives.

Why This Problem Exists

The static-lead-list workflow (How to build a B2B lead list from Yellow Pages) gets you a one-shot snapshot — every business currently in Plumbers, Los Angeles, CA. That's useful for the initial outbound push, but it goes stale fast. A directory like Yellow Pages turns over 5–10% of its category listings per year as businesses open, close, rebrand, or change locations. By month three after your initial pull, the list is 1–2% wrong; by month twelve, it's 5–10% wrong, and you are dialing dead numbers and emailing closed offices.

What you actually want is the incremental delta — the businesses that appeared in today's search and weren't in yesterday's. That data feed doesn't exist as a product, but the pattern to build it is short enough to live in one Python file.

The Pattern in Plain English

The diff loop has four steps that run on a daily cron:

  1. Run the Yellow Pages search for your target category + geo.
  2. Load the previous N days of stored results from your database.
  3. Find the rows in today's run whose phone number is not in the previous window.
  4. Send those rows to Slack, email, or your CRM.

The "previous N days" — not just yesterday — is the operationally important part. Yellow Pages's category ranking algorithm rotates listings; the same query can return slightly different result sets on consecutive runs even when nothing has actually changed. A naive 1-day diff produces 5–15% false positives daily. A 7-day rolling window cuts that to under 1% while still surfacing genuinely new listings within 24 hours.

Storing the Baseline

You need a tiny amount of persistence — one table mapping (phone_e164, first_seen_at). SQLite is plenty for any single-territory monitor and avoids the operational overhead of running Postgres for a hobby-scale workload.

import sqlite3
from contextlib import closing

DB = "yp_monitor.db"

def init_db():
    with closing(sqlite3.connect(DB)) as c:
        c.execute(
            """
            CREATE TABLE IF NOT EXISTS seen (
                phone_e164 TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                category_geo TEXT NOT NULL,
                first_seen_at TEXT NOT NULL,
                last_seen_at TEXT NOT NULL,
                listing_url TEXT,
                address TEXT
            )
            """
        )
        c.commit()

category_geo is a string like plumbers|los-angeles-ca so one database can host monitors for multiple categories without colliding on phone numbers (the same phone can legitimately appear in two different city searches).

The Full Daily Script

This is the script that goes on the cron. It scrapes, diffs against the rolling 7-day window, persists today's full snapshot, and prints new leads. Webhook/Slack/email wiring is the last 10 lines.

import os, time, json, sqlite3, datetime as dt, requests
from contextlib import closing

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
DB = "yp_monitor.db"
ROLLING_DAYS = 7


def submit_and_wait(path: str, params: dict, timeout_s: int = 120) -> dict:
    r = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30)
    r.raise_for_status()
    job_id = r.json()["job_id"]
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
        if s["status"] == "completed":
            break
        if s["status"] == "failed":
            raise RuntimeError(s.get("error", "unknown failure"))
        time.sleep(2)
    else:
        raise TimeoutError(f"job {job_id} did not finish in {timeout_s}s")
    return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()


def normalize_phone(raw: str | None) -> str | None:
    if not raw:
        return None
    digits = "".join(ch for ch in raw if ch.isdigit())
    if len(digits) == 10:
        return "+1" + digits
    if len(digits) == 11 and digits.startswith("1"):
        return "+" + digits
    return None


def diff_run(search_url: str, pages: int, category_geo: str) -> list[dict]:
    data = submit_and_wait(
        "ecommerce/yellowpages/search",
        {"url": search_url, "pages": pages},
    )
    now = dt.datetime.utcnow().isoformat()
    cutoff = (dt.datetime.utcnow() - dt.timedelta(days=ROLLING_DAYS)).isoformat()

    today_rows: dict[str, dict] = {}
    for r in data["listings"]:
        phone = normalize_phone(r.get("phone"))
        if not phone:
            continue
        today_rows[phone] = r

    new_leads: list[dict] = []
    with closing(sqlite3.connect(DB)) as c:
        cur = c.cursor()
        recent = {
            row[0]
            for row in cur.execute(
                "SELECT phone_e164 FROM seen WHERE last_seen_at >= ? AND category_geo = ?",
                (cutoff, category_geo),
            )
        }
        for phone, r in today_rows.items():
            if phone not in recent:
                new_leads.append({**r, "phone_e164": phone})

            cur.execute(
                """
                INSERT INTO seen (phone_e164, name, category_geo, first_seen_at, last_seen_at, listing_url, address)
                VALUES (?, ?, ?, ?, ?, ?, ?)
                ON CONFLICT(phone_e164) DO UPDATE SET last_seen_at = excluded.last_seen_at
                """,
                (
                    phone,
                    r.get("name", ""),
                    category_geo,
                    now,
                    now,
                    r.get("url"),
                    r.get("address"),
                ),
            )
        c.commit()

    return new_leads


def post_to_slack(webhook_url: str, leads: list[dict], category_geo: str) -> None:
    if not leads:
        return
    lines = [f"*{len(leads)} new listings in {category_geo}*"]
    for l in leads[:20]:
        lines.append(f"• {l['name']} — {l['phone_e164']} — {l.get('address','')}")
    if len(leads) > 20:
        lines.append(f"...and {len(leads) - 20} more")
    requests.post(webhook_url, json={"text": "\n".join(lines)}, timeout=10)


if __name__ == "__main__":
    with closing(sqlite3.connect(DB)) as c:
        c.execute(
            "CREATE TABLE IF NOT EXISTS seen ("
            "phone_e164 TEXT PRIMARY KEY, name TEXT, category_geo TEXT, "
            "first_seen_at TEXT, last_seen_at TEXT, listing_url TEXT, address TEXT)"
        )
        c.commit()

    new = diff_run(
        search_url=(
            "https://www.yellowpages.com/search?"
            "search_terms=Plumbers&geo_location_terms=Los+Angeles%2C+CA"
        ),
        pages=5,
        category_geo="plumbers|los-angeles-ca",
    )
    print(f"{len(new)} new leads")
    for n in new:
        print(f"  {n['name']} — {n['phone_e164']}")

    slack = os.environ.get("SLACK_WEBHOOK_URL")
    if slack:
        post_to_slack(slack, new, "plumbers|los-angeles-ca")

A typical 5-page daily run for an established US category surfaces 0–4 genuinely new listings per day, with first-week initialization seeing the full 150 leads on day one (everything is "new" relative to an empty database) and trending to steady-state by day 8.

Scheduling It

The pattern works the same whether you schedule it with cron, GitHub Actions, a hosted scheduler, or your container platform of choice. The simplest reliable setup is a tiny VM with a cron entry:

0 6 * * * cd /srv/yp-monitor && /usr/bin/python3 monitor.py >> monitor.log 2>&1

6am local time is the right slot — the diff finishes before the sales team starts work, the Slack post is at the top of their morning channel, and you get a full business day to act on each lead. Avoid scheduling during the noon-to-2pm window when Yellow Pages's own index updates tend to land; you will see more transient ranking churn.

For zero-ops scheduling, GitHub Actions handles this fine — commit the script to a private repo, add LOGPOSE_API_KEY and SLACK_WEBHOOK_URL as repository secrets, and schedule a workflow on cron. The free tier covers a single daily run on a single repo indefinitely.

Wiring the Alert Channel

Three patterns cover the realistic deployments:

Slack. Create an incoming-webhook in any channel, drop the URL into SLACK_WEBHOOK_URL, and the script above sends a daily summary. Best for SDR teams that already live in Slack.

Email digest. Send the new-lead list via SendGrid, AWS SES, or smtplib to a shared inbox. Best for solo agency owners who don't run a chat-ops setup.

CRM webhook. POST each new lead to your CRM's intake endpoint (HubSpot, Pipedrive, Close, Attio all accept inbound webhooks). Best for teams that want each new business to land as a deal in the pipeline rather than as a Slack message that someone has to triage.

Whichever you pick, make sure the alert includes the phone, name, and listing URL — a one-line Slack message with all three is more useful than a rich card that requires clicking through to see the dial number.

Scaling to Multiple Categories or Cities

The same script handles N category-geo pairs by looping the call and varying category_geo. Two patterns scale well.

One job per pair, scheduled in parallel. Cleanest for under 50 pairs. Each pair gets its own cron entry and its own Slack alert. Easy to debug, easy to disable individually.

Bulk submission with a shared diff pass. For 50+ pairs, submit them all in one bulk request and process the combined result downstream:

requests.post(
    "https://api.logposervices.com/api/v1/ecommerce/yellowpages/search/bulk",
    headers={"X-API-Key": os.environ["LOGPOSE_API_KEY"]},
    json={
        "targets": [
            {"url": "https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Los+Angeles%2C+CA", "pages": 5},
            {"url": "https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=San+Diego%2C+CA", "pages": 5},
            {"url": "https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Phoenix%2C+AZ", "pages": 5},
        ],
    },
).raise_for_status()

Bulk runs in parallel up to your account's concurrency cap, which keeps a 30-pair daily monitor under 5 minutes wall-clock instead of an hour sequential.

The No-Code Path (Roadmap Note)

If you'd rather not maintain a Python script, the LogPose monitor system handles diff-and-alert natively for product pages (Amazon, Zillow, Etsy, etc.) — set a monitor on a URL, pick a metric, and the platform stores history and fires webhooks on change. Yellow Pages search-result diffing is on the roadmap but not yet a first-class monitor type, so for now the script above is the right path. If you want to be notified when the no-code Yellow Pages monitor ships, drop your email in the Tool → Monitors panel and the system will email you the day it goes live.

For an example of the no-code monitor flow on a platform where it already exists, see How to monitor Zillow listings for real estate deals.

Common Mistakes

  • Diffing only against yesterday. Yellow Pages's ranking algorithm rotates results; a 1-day window produces a steady stream of false-positive "new" alerts that wear out your sales team. Use a 7-day rolling baseline.
  • Keying on name or URL instead of phone. Names drift in punctuation and legal-entity suffixes; URLs occasionally re-slug. Phone (normalized to E.164) is the only stable identifier.
  • Forgetting to seed the database. On day one, every listing is "new" relative to an empty DB and you flood Slack with 150 alerts. Run the script once with the alert wiring disabled to populate the baseline, then enable alerts on day two.
  • Setting the pages parameter too low. With pages=1 you only see the top 30 listings — businesses outside that window appear "new" to your diff every time Yellow Pages re-ranks. Use at least pages=5 so your monitored slice is wider than the daily churn window.
  • Ignoring the Cloudflare 100-second edge timeout. api.logposervices.com sits behind Cloudflare; jobs running longer than ~90 seconds return a 524 to the client even though the job continues server-side. The helper function above polls correctly, but if you swap in a synchronous call you will see spurious failures on big page counts.

Get Started

  1. Sign up at logposervices.com and generate an API key under Tool → API Keys.
  2. export LOGPOSE_API_KEY=lp_xxxxxxx and export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/....
  3. Drop the script into a repo, run it once to seed the baseline, then schedule the daily cron.

Related reading: How to build a B2B lead list from Yellow Pages (no code) for the one-shot snapshot version, How to scrape Google Maps for local business leads for the richer-data alternative, and Octoparse alternatives for lead generation for the broader tool comparison.

External: Slack incoming webhooks, GitHub Actions scheduled workflows, SQLite documentation.

Frequently asked questions

Why does first-mover lead capture matter for local-services sales?
A new local business opening typically buys its starter stack — payments, scheduling, marketing, insurance, accounting software — within the first 90 days, often within the first 30. The vendor who reaches them in week one wins about 60% of the deals; the vendor who reaches them in week eight is fighting an entrenched incumbent. For agency owners selling local-business services, new-listing capture is the single highest-leverage outbound channel because the prospect is in active buying mode and has not yet been pitched by your three direct competitors.
Does Yellow Pages have a 'new businesses' feed or RSS?
No. Yellow Pages does not publish a new-listings feed, an RSS endpoint, or any official notification of additions to a category. The standard workflow is to run the same category-and-geo search on a daily cron, persist the results, and diff against the previous day's run — anything in today's set that wasn't in yesterday's is either genuinely new or was previously suppressed by Yellow Pages's ranking algorithm. With a 7-day rolling baseline you can distinguish the two.
What identifier should I use to detect a 'new' Yellow Pages listing?
Use the normalized phone number (E.164, digits only) as the primary key. The listing URL changes too often (Yellow Pages occasionally re-slugs URLs when business names change), business names have spelling drift across exports (`Joe's Plumbing` vs `Joes Plumbing, Inc.`), and the address can shift if the business moves within the same city. Phone is the most stable identifier — a business that genuinely changes its phone line is rare, and when it happens it's usually a real signal worth flagging anyway.
How often should I run the diff?
Daily is the right cadence for most lead programs. Hourly is overkill — Yellow Pages updates its index in batches, not in real-time, so an hourly diff just costs more API credits without surfacing leads any faster. Weekly is too slow because by the time you call, the prospect has already bought from a competitor who runs daily. Schedule the job for 6am local time in your sales territory so the lead is in the rep's inbox before their first coffee.
Will I get duplicate alerts when Yellow Pages re-ranks an old listing into the top results?
Yes, and this is why a 7-day rolling baseline matters more than a 1-day diff. Yellow Pages's category ranking is non-deterministic — the same search query can return slightly different result sets across runs as their algorithm rotates listings. If you key your 'new' detection only on yesterday's result set, you will surface 5–15% false positives every day. Keeping a 7-day window of seen phone numbers and only alerting on phones never seen in that window cuts the false-positive rate to under 1% without missing genuinely new businesses.

Related posts

Tutorial

How to Scrape Google Maps for Local Business Leads

10 min read
Tutorial

How to Build a B2B Lead List from Yellow Pages (No Code)

9 min read
Comparison

Octoparse Alternatives for Lead Generation (No-Code & API)

9 min read