How to Monitor Yellow Pages for New Businesses in Your Category
For agency owners and local-services sales teams, the most valuable lead in any category is the business that opened last week. They are buying their first vendors right now, have no incumbent relationships, and are in active research mode. The problem: Yellow Pages does not publish a "new businesses" feed, so there is no obvious way to find them other than re-running the same search every day and comparing the results. This guide walks the full diff-loop pattern — Python script, alerting, scheduling, and the operational details that separate a working monitor from one that floods Slack with false positives.
Why This Problem Exists
The static-lead-list workflow (How to build a B2B lead list from Yellow Pages) gets you a one-shot snapshot — every business currently in Plumbers, Los Angeles, CA. That's useful for the initial outbound push, but it goes stale fast. A directory like Yellow Pages turns over 5–10% of its category listings per year as businesses open, close, rebrand, or change locations. By month three after your initial pull, the list is 1–2% wrong; by month twelve, it's 5–10% wrong, and you are dialing dead numbers and emailing closed offices.
What you actually want is the incremental delta — the businesses that appeared in today's search and weren't in yesterday's. That data feed doesn't exist as a product, but the pattern to build it is short enough to live in one Python file.
The Pattern in Plain English
The diff loop has four steps that run on a daily cron:
- Run the Yellow Pages search for your target category + geo.
- Load the previous N days of stored results from your database.
- Find the rows in today's run whose phone number is not in the previous window.
- Send those rows to Slack, email, or your CRM.
The "previous N days" — not just yesterday — is the operationally important part. Yellow Pages's category ranking algorithm rotates listings; the same query can return slightly different result sets on consecutive runs even when nothing has actually changed. A naive 1-day diff produces 5–15% false positives daily. A 7-day rolling window cuts that to under 1% while still surfacing genuinely new listings within 24 hours.
Storing the Baseline
You need a tiny amount of persistence — one table mapping (phone_e164, first_seen_at). SQLite is plenty for any single-territory monitor and avoids the operational overhead of running Postgres for a hobby-scale workload.
import sqlite3
from contextlib import closing
DB = "yp_monitor.db"
def init_db():
with closing(sqlite3.connect(DB)) as c:
c.execute(
"""
CREATE TABLE IF NOT EXISTS seen (
phone_e164 TEXT PRIMARY KEY,
name TEXT NOT NULL,
category_geo TEXT NOT NULL,
first_seen_at TEXT NOT NULL,
last_seen_at TEXT NOT NULL,
listing_url TEXT,
address TEXT
)
"""
)
c.commit()
category_geo is a string like plumbers|los-angeles-ca so one database can host monitors for multiple categories without colliding on phone numbers (the same phone can legitimately appear in two different city searches).
The Full Daily Script
This is the script that goes on the cron. It scrapes, diffs against the rolling 7-day window, persists today's full snapshot, and prints new leads. Webhook/Slack/email wiring is the last 10 lines.
import os, time, json, sqlite3, datetime as dt, requests
from contextlib import closing
API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
DB = "yp_monitor.db"
ROLLING_DAYS = 7
def submit_and_wait(path: str, params: dict, timeout_s: int = 120) -> dict:
r = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30)
r.raise_for_status()
job_id = r.json()["job_id"]
deadline = time.time() + timeout_s
while time.time() < deadline:
s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
if s["status"] == "completed":
break
if s["status"] == "failed":
raise RuntimeError(s.get("error", "unknown failure"))
time.sleep(2)
else:
raise TimeoutError(f"job {job_id} did not finish in {timeout_s}s")
return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()
def normalize_phone(raw: str | None) -> str | None:
if not raw:
return None
digits = "".join(ch for ch in raw if ch.isdigit())
if len(digits) == 10:
return "+1" + digits
if len(digits) == 11 and digits.startswith("1"):
return "+" + digits
return None
def diff_run(search_url: str, pages: int, category_geo: str) -> list[dict]:
data = submit_and_wait(
"ecommerce/yellowpages/search",
{"url": search_url, "pages": pages},
)
now = dt.datetime.utcnow().isoformat()
cutoff = (dt.datetime.utcnow() - dt.timedelta(days=ROLLING_DAYS)).isoformat()
today_rows: dict[str, dict] = {}
for r in data["listings"]:
phone = normalize_phone(r.get("phone"))
if not phone:
continue
today_rows[phone] = r
new_leads: list[dict] = []
with closing(sqlite3.connect(DB)) as c:
cur = c.cursor()
recent = {
row[0]
for row in cur.execute(
"SELECT phone_e164 FROM seen WHERE last_seen_at >= ? AND category_geo = ?",
(cutoff, category_geo),
)
}
for phone, r in today_rows.items():
if phone not in recent:
new_leads.append({**r, "phone_e164": phone})
cur.execute(
"""
INSERT INTO seen (phone_e164, name, category_geo, first_seen_at, last_seen_at, listing_url, address)
VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(phone_e164) DO UPDATE SET last_seen_at = excluded.last_seen_at
""",
(
phone,
r.get("name", ""),
category_geo,
now,
now,
r.get("url"),
r.get("address"),
),
)
c.commit()
return new_leads
def post_to_slack(webhook_url: str, leads: list[dict], category_geo: str) -> None:
if not leads:
return
lines = [f"*{len(leads)} new listings in {category_geo}*"]
for l in leads[:20]:
lines.append(f"• {l['name']} — {l['phone_e164']} — {l.get('address','')}")
if len(leads) > 20:
lines.append(f"...and {len(leads) - 20} more")
requests.post(webhook_url, json={"text": "\n".join(lines)}, timeout=10)
if __name__ == "__main__":
with closing(sqlite3.connect(DB)) as c:
c.execute(
"CREATE TABLE IF NOT EXISTS seen ("
"phone_e164 TEXT PRIMARY KEY, name TEXT, category_geo TEXT, "
"first_seen_at TEXT, last_seen_at TEXT, listing_url TEXT, address TEXT)"
)
c.commit()
new = diff_run(
search_url=(
"https://www.yellowpages.com/search?"
"search_terms=Plumbers&geo_location_terms=Los+Angeles%2C+CA"
),
pages=5,
category_geo="plumbers|los-angeles-ca",
)
print(f"{len(new)} new leads")
for n in new:
print(f" {n['name']} — {n['phone_e164']}")
slack = os.environ.get("SLACK_WEBHOOK_URL")
if slack:
post_to_slack(slack, new, "plumbers|los-angeles-ca")
A typical 5-page daily run for an established US category surfaces 0–4 genuinely new listings per day, with first-week initialization seeing the full 150 leads on day one (everything is "new" relative to an empty database) and trending to steady-state by day 8.
Scheduling It
The pattern works the same whether you schedule it with cron, GitHub Actions, a hosted scheduler, or your container platform of choice. The simplest reliable setup is a tiny VM with a cron entry:
0 6 * * * cd /srv/yp-monitor && /usr/bin/python3 monitor.py >> monitor.log 2>&1
6am local time is the right slot — the diff finishes before the sales team starts work, the Slack post is at the top of their morning channel, and you get a full business day to act on each lead. Avoid scheduling during the noon-to-2pm window when Yellow Pages's own index updates tend to land; you will see more transient ranking churn.
For zero-ops scheduling, GitHub Actions handles this fine — commit the script to a private repo, add LOGPOSE_API_KEY and SLACK_WEBHOOK_URL as repository secrets, and schedule a workflow on cron. The free tier covers a single daily run on a single repo indefinitely.
Wiring the Alert Channel
Three patterns cover the realistic deployments:
Slack. Create an incoming-webhook in any channel, drop the URL into SLACK_WEBHOOK_URL, and the script above sends a daily summary. Best for SDR teams that already live in Slack.
Email digest. Send the new-lead list via SendGrid, AWS SES, or smtplib to a shared inbox. Best for solo agency owners who don't run a chat-ops setup.
CRM webhook. POST each new lead to your CRM's intake endpoint (HubSpot, Pipedrive, Close, Attio all accept inbound webhooks). Best for teams that want each new business to land as a deal in the pipeline rather than as a Slack message that someone has to triage.
Whichever you pick, make sure the alert includes the phone, name, and listing URL — a one-line Slack message with all three is more useful than a rich card that requires clicking through to see the dial number.
Scaling to Multiple Categories or Cities
The same script handles N category-geo pairs by looping the call and varying category_geo. Two patterns scale well.
One job per pair, scheduled in parallel. Cleanest for under 50 pairs. Each pair gets its own cron entry and its own Slack alert. Easy to debug, easy to disable individually.
Bulk submission with a shared diff pass. For 50+ pairs, submit them all in one bulk request and process the combined result downstream:
requests.post(
"https://api.logposervices.com/api/v1/ecommerce/yellowpages/search/bulk",
headers={"X-API-Key": os.environ["LOGPOSE_API_KEY"]},
json={
"targets": [
{"url": "https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Los+Angeles%2C+CA", "pages": 5},
{"url": "https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=San+Diego%2C+CA", "pages": 5},
{"url": "https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Phoenix%2C+AZ", "pages": 5},
],
},
).raise_for_status()
Bulk runs in parallel up to your account's concurrency cap, which keeps a 30-pair daily monitor under 5 minutes wall-clock instead of an hour sequential.
The No-Code Path (Roadmap Note)
If you'd rather not maintain a Python script, the LogPose monitor system handles diff-and-alert natively for product pages (Amazon, Zillow, Etsy, etc.) — set a monitor on a URL, pick a metric, and the platform stores history and fires webhooks on change. Yellow Pages search-result diffing is on the roadmap but not yet a first-class monitor type, so for now the script above is the right path. If you want to be notified when the no-code Yellow Pages monitor ships, drop your email in the Tool → Monitors panel and the system will email you the day it goes live.
For an example of the no-code monitor flow on a platform where it already exists, see How to monitor Zillow listings for real estate deals.
Common Mistakes
- Diffing only against yesterday. Yellow Pages's ranking algorithm rotates results; a 1-day window produces a steady stream of false-positive "new" alerts that wear out your sales team. Use a 7-day rolling baseline.
- Keying on name or URL instead of phone. Names drift in punctuation and legal-entity suffixes; URLs occasionally re-slug. Phone (normalized to E.164) is the only stable identifier.
- Forgetting to seed the database. On day one, every listing is "new" relative to an empty DB and you flood Slack with 150 alerts. Run the script once with the alert wiring disabled to populate the baseline, then enable alerts on day two.
- Setting the pages parameter too low. With
pages=1you only see the top 30 listings — businesses outside that window appear "new" to your diff every time Yellow Pages re-ranks. Use at leastpages=5so your monitored slice is wider than the daily churn window. - Ignoring the Cloudflare 100-second edge timeout.
api.logposervices.comsits behind Cloudflare; jobs running longer than ~90 seconds return a 524 to the client even though the job continues server-side. The helper function above polls correctly, but if you swap in a synchronous call you will see spurious failures on big page counts.
Get Started
- Sign up at logposervices.com and generate an API key under Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxxandexport SLACK_WEBHOOK_URL=https://hooks.slack.com/services/....- Drop the script into a repo, run it once to seed the baseline, then schedule the daily cron.
Related reading: How to build a B2B lead list from Yellow Pages (no code) for the one-shot snapshot version, How to scrape Google Maps for local business leads for the richer-data alternative, and Octoparse alternatives for lead generation for the broader tool comparison.
External: Slack incoming webhooks, GitHub Actions scheduled workflows, SQLite documentation.