How to Scrape Facebook Page Posts for Competitor Watch
For any brand operating in a competitive consumer category — DTC, beauty, food and beverage, fashion, software with a marketing-led GTM — what your direct competitors post on Facebook is one of the highest-signal datasets you have access to. It tells you their campaign cadence, the creative directions they are betting on, which posts land and which die in the feed, and how the audience is responding in near-real-time. The catch is that Meta has progressively locked down the Graph API to the point where post-level monitoring of pages you do not own is no longer practical through official channels. The working path in 2026 is to scrape the public web interface of Facebook using session cookies from a real account. This guide walks the full pipeline: the cookie setup, the API call, the per-post fields you get back, and the daily diff loop that turns it into a competitor dashboard.
Why Brand Strategists Watch Competitor Facebook Pages
The honest competitive-intelligence stack for a consumer brand looks like this. Paid-ad transparency through the Meta Ad Library shows you what creative your competitors are actively spending on, but not the organic context around it. SimilarWeb tells you their traffic shape but nothing about the content. Influencer-tracking tools cover creator partnerships but miss owned-channel cadence. Facebook organic posts sit in the gap: they show you the brand's voice and creative cadence on the channel where most consumer brands still maintain their largest owned audience. Even with the platform's organic reach decline, Facebook page activity remains the single most reliable indicator of what a competitor is currently prioritizing.
The other reason Facebook pages matter is engagement transparency. Unlike Instagram (where view counts are gated behind ownership of the post) and unlike TikTok (where view counts are inflated by autoplay), Facebook surfaces reaction count, comment count, and share count on every public post. Those three numbers, scraped consistently across thirty days, paint a clear picture of which creative directions are working for any brand you can name.
What Actually Comes Back Per Post
A /facebook/scrape call against a page URL returns post-level rows. Each row gives you:
| Field | Example |
|---|---|
post_id | 9876543210_123456789 |
post_url | https://www.facebook.com/SomeBrand/posts/123456789 |
author_name | Some Brand |
author_id | 9876543210 |
text | "Our spring collection drops Friday. Tap to set a reminder →" |
timestamp | 2026-05-22T14:30:00Z |
post_type | photo |
media_urls | ["https://scontent.xx.fbcdn.net/..."] |
link_preview | {"url": "...", "title": "...", "domain": "..."} |
reactions_total | 1284 |
reactions_breakdown | {"like": 1102, "love": 134, "wow": 21, "haha": 18, "sad": 5, "angry": 4} |
comments_count | 87 |
shares_count | 42 |
page_followers | 384200 |
page_verified | true |
page_category | Clothing (Brand) |
What it does not include: the actual text of individual comments, the list of users who reacted, demographic breakdowns of the audience, or any signal of whether the post is being boosted as a paid ad. For ad-spend visibility, the Meta Ad Library remains the right tool and is best used alongside this scrape — not as a substitute.
One quirk worth flagging upfront. Facebook's internal element naming (Meta calls these friendly_names inside the React tree) varies by account rollout. The same DOM rendered for two different logged-in users may have different attribute names on the same buttons, because Meta runs continuous A/B experiments on its own UI. In practice this means: a session that worked yesterday may return slightly fewer reaction-breakdown fields tomorrow if Meta rolls the account into a new experiment cohort. Production setups handle this by using stable, long-aged accounts (not fresh ones) and by treating the reaction breakdown as best-effort while keeping the total reaction count as the source of truth.
The Cookie Session Setup
Facebook detects unauthenticated traffic almost immediately and serves a login wall, even on pages that are technically public. To get past that, the scrape needs to present itself as a real logged-in browser, which means real session cookies. Auto-login from username and password is intentionally not built — credential injection breaks every time Meta updates its login flow and crosses a clearer line against the terms of service. The supported flow is a one-time cookie paste, then the session is reused across jobs.
Extracting the cookies takes about ninety seconds:
- Open a fresh browser profile (Chrome or Firefox, doesn't matter).
- Log into facebook.com normally. Use a stable, long-aged account if possible — one that has been active for at least six months, ideally one used as a real account. Meta deprioritizes freshly-created accounts in some experiment cohorts.
- Open DevTools (
F12orCmd-Opt-I), go to the Application tab → Cookies →https://www.facebook.com. - Copy the values of these four cookies:
c_user,xs,fr,datr. Thec_uservalue is the numeric user ID;xsis the session token;fris the device fingerprint;datris the device-installation cookie. The four together constitute a complete logged-in session. - In the LogPose dashboard, go to Accounts → Facebook → Add account and paste those four values. The platform stores them encrypted and references them by
account_idon every subsequent scrape call.
That account_id becomes a query parameter on the scrape request. The session persists until Facebook expires it (typically 60–90 days for an active account, sooner if the account is also being used to log in from other devices in parallel). When the session expires, the scraper returns a clear "session expired" error rather than silently failing, and the cookies need to be re-pasted.
The API Call
The endpoint is asynchronous — submit a job, poll for completion, fetch the result. Three curl calls walk the full flow:
# 1. Submit
curl -G "https://api.logposervices.com/api/v1/social/facebook/scrape" \
-H "X-API-Key: lp_xxxxxxx" \
--data-urlencode "url=https://www.facebook.com/SomeBrand" \
--data-urlencode "limit=30" \
--data-urlencode "account_id=fb_acct_8a3f..."
# → {"job_id": "fb_2c91..."}
# 2. Poll (or wait inline)
curl -H "X-API-Key: lp_xxxxxxx" \
"https://api.logposervices.com/api/v1/jobs/fb_2c91?wait=true&timeout=60"
# 3. Fetch result
curl -H "X-API-Key: lp_xxxxxxx" \
https://api.logposervices.com/api/v1/jobs/fb_2c91/result
Three parameters matter:
url— the Facebook page URL. Either the vanity form (facebook.com/SomeBrand) or the numeric form (facebook.com/profile.php?id=...) works. The scraper resolves both. Post-specific URLs (/posts/...) and watch URLs (fb.watch/...) are also accepted if a single-post scrape is what's needed.limit— how many posts to pull, starting from the most recent. The endpoint paginates internally; alimit=30request fetches the latest 30 posts in one job. The default is 30, which covers about two weeks of activity for a daily-posting brand.account_id— the encrypted cookie reference from the setup step above. Without it, the job will fail with a 401-equivalent error before it even starts the scrape.
A limit=30 job typically finishes in 30–60 seconds. Larger pulls (limit=100+) scale roughly linearly and should always be polled rather than waited on inline, because the Cloudflare edge in front of the API closes connections at 100 seconds.
A Python Script That Pulls A Page Daily
This is the script most brand teams end up running on a nightly cron — submit the page, wait for completion, write the result to a date-stamped JSON file. Tomorrow's run produces another file, and the diff between them is the dashboard.
import os, time, json, requests
from datetime import date
API_KEY = os.environ["LOGPOSE_API_KEY"]
FB_ACCOUNT_ID = os.environ["LOGPOSE_FB_ACCOUNT_ID"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
def submit_and_wait(path: str, params: dict, timeout_s: int = 180) -> dict:
r = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30)
r.raise_for_status()
job_id = r.json()["job_id"]
deadline = time.time() + timeout_s
while time.time() < deadline:
s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
if s["status"] == "completed":
break
if s["status"] == "failed":
raise RuntimeError(s.get("error", "unknown failure"))
time.sleep(3)
else:
raise TimeoutError(f"job {job_id} did not finish in {timeout_s}s")
return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()
def snapshot_page(page_url: str, limit: int, out_dir: str) -> int:
data = submit_and_wait(
"social/facebook/scrape",
{"url": page_url, "limit": limit, "account_id": FB_ACCOUNT_ID},
)
posts = data["posts"]
out_path = f"{out_dir}/{date.today().isoformat()}.json"
os.makedirs(out_dir, exist_ok=True)
with open(out_path, "w", encoding="utf-8") as f:
json.dump(posts, f, ensure_ascii=False, indent=2)
return len(posts)
if __name__ == "__main__":
n = snapshot_page(
"https://www.facebook.com/SomeBrand",
limit=30,
out_dir="snapshots/somebrand",
)
print(f"snapshotted {n} posts")
Run that nightly across the three to five competitor pages that matter most. The output is a directory of date-keyed JSON files, ready for the diff step.
The Daily Diff Loop
Once two snapshots exist, the interesting work begins. The diff between yesterday and today tells you four things: which posts are new, which posts have been deleted (rare but worth flagging — usually a signal of a campaign mistake), how engagement is moving on existing posts, and how the per-reaction breakdown is shifting.
import json
from pathlib import Path
from datetime import date, timedelta
def load(d: date, page: str) -> dict:
p = Path(f"snapshots/{page}/{d.isoformat()}.json")
if not p.exists():
return {}
return {p["post_id"]: p for p in json.loads(p.read_text())}
def diff_one_page(page: str, today: date = None):
today = today or date.today()
yesterday = today - timedelta(days=1)
y, t = load(yesterday, page), load(today, page)
new_posts = [t[k] for k in t if k not in y]
removed = [y[k] for k in y if k not in t]
velocity = []
for k in t.keys() & y.keys():
delta_reactions = t[k]["reactions_total"] - y[k]["reactions_total"]
delta_comments = t[k]["comments_count"] - y[k]["comments_count"]
if delta_reactions > 0 or delta_comments > 0:
velocity.append({
"post_id": k,
"text": t[k]["text"][:120],
"delta_reactions": delta_reactions,
"delta_comments": delta_comments,
"total_reactions": t[k]["reactions_total"],
})
return {
"page": page,
"new_posts": new_posts,
"removed_posts": removed,
"velocity": sorted(velocity, key=lambda x: x["delta_reactions"], reverse=True),
}
if __name__ == "__main__":
report = diff_one_page("somebrand")
print(f"new: {len(report['new_posts'])} removed: {len(report['removed_posts'])}")
for v in report["velocity"][:5]:
print(f" +{v['delta_reactions']} reactions / +{v['delta_comments']} comments — {v['text']}")
That output, piped into a Slack channel or a weekly email digest, is the actual brand-monitoring dashboard. Strategists care about three things from it. First: cadence — is the competitor posting daily, every other day, weekly? A shift in cadence almost always precedes a campaign push. Second: creative direction — does the new-posts list cluster around a theme (sustainability, behind-the-scenes, founder content, UGC)? That tells you what the team is currently betting on. Third: velocity — which posts are gaining the most reactions per day, and what do they have in common? That is your unfiltered read on what is actually resonating with their audience, separate from what they are paying to amplify.
Reading the Reaction Breakdown
The per-reaction breakdown (like, love, wow, haha, sad, angry) is the most underused field in the response. The naive read is to look at total reactions, but the breakdown ratios carry meaningfully different signal.
A high love-to-like ratio (above 15%) signals deep emotional resonance, typically on founder content, mission-driven posts, or customer stories. A high haha-to-like ratio signals successful humor — rare for most brands, valuable when achieved. A non-zero angry count on a brand post is the canary: it usually means the post has been picked up by a hostile community (review bombing, a political backlash, an ad targeting mistake). Track angry over time and you have an early-warning system for competitor crisis moments, which is genuinely useful intelligence when planning your own messaging that week.
Cross-Referencing With The Meta Ad Library
Organic post engagement on its own can be misleading. A post with three thousand reactions on a page with two hundred thousand followers looks like a hit, until you discover the brand has been actively running it as a sponsored ad for the last fourteen days — at which point the engagement is mostly paid distribution, not organic resonance. The Meta Ad Library publishes every active ad creative for any page, and joining that data against the organic scrape changes the read significantly.
The simplest pattern: pull the page's active ads from the Ad Library (it has its own search URL per page) on the same nightly cadence, and tag each scraped post with an is_boosted flag if the post text or media URL also appears in the active-ads list. Posts that engage well without being boosted are the genuine organic wins worth studying; posts that engage well because they are being boosted tell a different story about budget allocation and creative confidence. Both are useful, but only after they are separated.
This is also where the reactions_breakdown field earns its keep. Paid distribution tends to flatten the breakdown toward like (because cold audiences default to the easiest reaction), while organic distribution to engaged followers produces a much higher love share. A post with eighty percent like and ten percent love is probably boosted; a post with sixty percent like and twenty-five percent love is probably resonating organically. That heuristic is not perfect, but it is consistent enough across categories to be useful as a tiebreaker when the boost flag is ambiguous.
Posting Time Patterns
Beyond what a competitor posts, when they post is a signal worth tracking. The timestamp field on every scraped post is the raw material. Three patterns are worth pulling from thirty days of data per page.
First, the time-of-day histogram — most consumer brands cluster posts in two or three windows (morning, midday, evening), and a shift in that distribution typically means a new social manager, a tool change, or a shift in target audience.
Second, the day-of-week histogram. Brands that post on weekends are typically running an editorial calendar with a dedicated content lead; weekday-only brands are typically running through an agency or a marketing-ops tool. Knowing which model a competitor uses tells you how nimble their content team is.
Third, the gap distribution — the time between consecutive posts. A consistent two-day gap means a planned calendar; high variance means reactive posting tied to news cycles. Reactive posters are easier to out-cadence; calendared posters require matching their rhythm to compete in the feed.
All three patterns are derivable from the same scraped JSON snapshots with a few lines of pandas.
Scaling To Multiple Competitor Pages
A single competitor page is one watcher; a brand strategist usually wants to watch five to fifteen pages — direct competitors, adjacent-category brands worth learning from, the category leader, and a few up-and-coming challengers. The submit-and-wait pattern above scales fine for that count if you sequence the calls, but for a portfolio of fifteen or more pages, bulk submission cuts the wall-clock time substantially. The bulk endpoint accepts a list of page URLs and schedules them across the available concurrency, finishing the whole portfolio in roughly the time of a single page.
The other scaling consideration is account hygiene. One cookie session per scraper is the simplest setup, but Meta's rate-limit signals are account-keyed, so heavy use of a single account on a portfolio of fifteen pages can eventually trigger soft throttling on that account. Production setups rotate across two or three connected accounts on the dashboard — the platform handles the rotation transparently once multiple accounts are connected.
Common Mistakes
- Using a fresh Facebook account for the cookie session. Meta's anti-abuse systems disproportionately rate-limit accounts that are less than thirty days old. Use an account that has been active for at least six months, ideally one with a real history of logins from a stable device. The scraper will technically work with a one-day-old account, but the session will expire faster and the rate limits will bite harder.
- Scraping the same page multiple times per day. There is no benefit. Facebook updates engagement counts in roughly hourly increments, and Meta's anti-abuse systems treat repeated rapid-fire scrapes of the same page as a strong bot signal. One snapshot per day per page is enough for any longitudinal analysis; two if you need finer-grained reaction velocity.
- Treating reaction counts as ground truth in the first hour after publish. Reaction counts on a brand-new post lag the actual engagement by ten to thirty minutes due to Facebook's caching. A post scraped five minutes after it goes live may show zero reactions even when the live page shows fifty. Wait at least an hour from
timestampbefore treating the engagement numbers as accurate. - Pulling more than a hundred posts at once on the first scrape. The first call from a new session is the most likely to hit Facebook's rate limiter because Meta has not yet established a behavioral baseline for the account. Start with
limit=30for the first few jobs per session; once those have run cleanly, larger pulls are fine. - Forgetting that the Cloudflare edge in front of the API closes connections at one hundred seconds. A large
limitvalue translates to a longer scrape; jobs above sixty seconds should always be polled with/api/v1/jobs/{job_id}rather than waited on inline with thewait=trueparameter.
Legality And Brand-Safety Notes
Public Facebook page content is exactly that — public. Meta operates Facebook pages as a discovery surface, and every post on a public page is indexed by Google and surfaced to non-logged-in visitors who hit the right URL. Scraping that content for competitive monitoring is on settled legal ground in the US and broadly compliant in the EU under GDPR's legitimate-interest basis for B2B competitive intelligence, provided the data stays internal and is not republished as a competing product or used to train a customer-facing model.
The brand-safety side is worth flagging separately from the legal side. Internal stakeholders sometimes flinch at the word "scraping" because it sounds adversarial. The reframe that lands well with legal and brand teams is to call it what it is: competitive monitoring using public web data, the same input a human strategist would gather by manually checking competitor pages each morning, just automated. Frame it as time saved, not stealth, and the conversation usually settles within one meeting.
Get Started
- Sign up at logposervices.com and generate an API key under Tool → API Keys.
- Connect a Facebook account under Accounts → Facebook → Add account by pasting the four session cookies (
c_user,xs,fr,datr). export LOGPOSE_API_KEY=lp_xxxxxxx LOGPOSE_FB_ACCOUNT_ID=fb_acct_xxxx- Pick three competitor pages and run the Python snapshot script above against each on a nightly cron.
Related reading: How to scrape Instagram for content strategy for the matching playbook on the other Meta surface, How to find trending TikTok creators by hashtag and niche for the short-video side of competitive monitoring, and the web scraping API guide for the broader DIY-versus-managed comparison.
External: Meta Ad Library for paid-side visibility, hiQ Labs v. LinkedIn on public-data scraping precedent.