Is it legal to scrape Facebook public pages?

Facebook business pages are public by design — they exist to be discovered by non-logged-in visitors, and every post on a public page is indexed by Google. Scraping public web data is not a CFAA violation in the US (hiQ Labs v. LinkedIn, 9th Cir. 2022), and EU/UK courts have repeatedly treated public commercial content as fair game for competitive intelligence under legitimate interest. What Meta's Terms of Service prohibit is automated access via the Graph API without an approved app review, and republishing content as a competing product. For competitor monitoring — pulling post text and engagement counts into an internal brand dashboard — the scrape is on settled ground; the downstream use (training a model, reselling the data) is where actual legal review matters.

Why does Facebook scraping need session cookies, and how are they extracted?

Meta's user-facing site detects unauthenticated traffic within a handful of requests and serves a login wall, even for content that is technically public. Reliable scraping therefore requires session cookies from a real logged-in account so the page renders the same HTML a human visitor sees. Auto-login is intentionally not built into the LogPose Facebook endpoint — credential injection violates Meta's terms more directly than a cookie paste, and breaks every time Facebook rotates its login flow. The supported flow is: log into Facebook in a normal browser, open DevTools, copy the `c_user`, `xs`, `fr`, and `datr` cookies, and paste them into the LogPose account-connection UI. The session is then reused across jobs.

What fields come back per Facebook post?

Each post returns the post ID, post URL, author (page name and ID), the full text body, post timestamp, post type (status, photo, video, link, shared), attached media URLs when present, link preview metadata for shared links, reaction count and a per-reaction breakdown (like/love/wow/haha/sad/angry), comment count, share count, and the page-level metadata (page ID, page category, follower count, verified flag). What does not come back: the actual comment thread bodies, the list of who reacted, audience demographics, and any signal of paid promotion — Meta strips those even from logged-in scrapes.

How can post-level engagement be tracked over time?

Engagement on a Facebook post is not static — reactions and comments accumulate for days or weeks after publish, then plateau. To track campaign performance, scrape the same page nightly and key each row by post ID. Compute deltas: today's reaction count minus yesterday's gives daily engagement velocity; the ratio of reactions-on-day-1 to reactions-on-day-7 measures how long a post stays alive in the feed. A 30-day rolling chart of post velocity per page reveals exactly which creative directions resonate with the competitor's audience, even without access to their ad spend.

What changed about Facebook scraping in 2024–2026?

Meta progressively locked down the user-facing Graph API (`graph.facebook.com`) between 2022 and 2025, removing public-page-feed access for non-approved apps, deprecating the page-search endpoint, and tightening rate limits on what little remained. By 2026, the practical-only path for competitive monitoring is to scrape the rendered HTML of facebook.com itself — which is why session cookies became mandatory and why every serious social listening platform shifted from API integrations to authenticated UI scraping. One side effect to be aware of: Facebook's element names (the friendly_names that identify reaction buttons, comment counters, share buttons in the DOM) are account-rollout-dependent. Two accounts logged in at the same time may see different DOM structures, so results can vary slightly based on whose cookies are used. Production scraping favors stable, long-aged accounts over freshly created ones.

← Back to blogTutorial

How to Scrape Facebook Page Posts for Competitor Watch

May 28, 2026 · 11 min read

For any brand operating in a competitive consumer category — DTC, beauty, food and beverage, fashion, software with a marketing-led GTM — what your direct competitors post on Facebook is one of the highest-signal datasets you have access to. It tells you their campaign cadence, the creative directions they are betting on, which posts land and which die in the feed, and how the audience is responding in near-real-time. The catch is that Meta has progressively locked down the Graph API to the point where post-level monitoring of pages you do not own is no longer practical through official channels. The working path in 2026 is to scrape the public web interface of Facebook using session cookies from a real account. This guide walks the full pipeline: the cookie setup, the API call, the per-post fields you get back, and the daily diff loop that turns it into a competitor dashboard.

Why Brand Strategists Watch Competitor Facebook Pages

The honest competitive-intelligence stack for a consumer brand looks like this. Paid-ad transparency through the Meta Ad Library shows you what creative your competitors are actively spending on, but not the organic context around it. SimilarWeb tells you their traffic shape but nothing about the content. Influencer-tracking tools cover creator partnerships but miss owned-channel cadence. Facebook organic posts sit in the gap: they show you the brand's voice and creative cadence on the channel where most consumer brands still maintain their largest owned audience. Even with the platform's organic reach decline, Facebook page activity remains the single most reliable indicator of what a competitor is currently prioritizing.

The other reason Facebook pages matter is engagement transparency. Unlike Instagram (where view counts are gated behind ownership of the post) and unlike TikTok (where view counts are inflated by autoplay), Facebook surfaces reaction count, comment count, and share count on every public post. Those three numbers, scraped consistently across thirty days, paint a clear picture of which creative directions are working for any brand you can name.

What Actually Comes Back Per Post

A /facebook/scrape call against a page URL returns post-level rows. Each row gives you:

Field	Example
`post_id`	9876543210_123456789
`post_url`	https://www.facebook.com/SomeBrand/posts/123456789
`author_name`	Some Brand
`author_id`	9876543210
`text`	"Our spring collection drops Friday. Tap to set a reminder →"
`timestamp`	2026-05-22T14:30:00Z
`post_type`	photo
`media_urls`	["https://scontent.xx.fbcdn.net/..."]
`link_preview`	`{"url": "...", "title": "...", "domain": "..."}`
`reactions_total`	1284
`reactions_breakdown`	`{"like": 1102, "love": 134, "wow": 21, "haha": 18, "sad": 5, "angry": 4}`
`comments_count`	87
`shares_count`	42
`page_followers`	384200
`page_verified`	true
`page_category`	Clothing (Brand)

What it does not include: the actual text of individual comments, the list of users who reacted, demographic breakdowns of the audience, or any signal of whether the post is being boosted as a paid ad. For ad-spend visibility, the Meta Ad Library remains the right tool and is best used alongside this scrape — not as a substitute.

One quirk worth flagging upfront. Facebook's internal element naming (Meta calls these friendly_names inside the React tree) varies by account rollout. The same DOM rendered for two different logged-in users may have different attribute names on the same buttons, because Meta runs continuous A/B experiments on its own UI. In practice this means: a session that worked yesterday may return slightly fewer reaction-breakdown fields tomorrow if Meta rolls the account into a new experiment cohort. Production setups handle this by using stable, long-aged accounts (not fresh ones) and by treating the reaction breakdown as best-effort while keeping the total reaction count as the source of truth.

Facebook detects unauthenticated traffic almost immediately and serves a login wall, even on pages that are technically public. To get past that, the scrape needs to present itself as a real logged-in browser, which means real session cookies. Auto-login from username and password is intentionally not built — credential injection breaks every time Meta updates its login flow and crosses a clearer line against the terms of service. The supported flow is a one-time cookie paste, then the session is reused across jobs.

Extracting the cookies takes about ninety seconds:

Open a fresh browser profile (Chrome or Firefox, doesn't matter).
Log into facebook.com normally. Use a stable, long-aged account if possible — one that has been active for at least six months, ideally one used as a real account. Meta deprioritizes freshly-created accounts in some experiment cohorts.
Open DevTools (F12 or Cmd-Opt-I), go to the Application tab → Cookies → https://www.facebook.com.
Copy the values of these four cookies: c_user, xs, fr, datr. The c_user value is the numeric user ID; xs is the session token; fr is the device fingerprint; datr is the device-installation cookie. The four together constitute a complete logged-in session.
In the LogPose dashboard, go to Accounts → Facebook → Add account and paste those four values. The platform stores them encrypted and references them by account_id on every subsequent scrape call.

That account_id becomes a query parameter on the scrape request. The session persists until Facebook expires it (typically 60–90 days for an active account, sooner if the account is also being used to log in from other devices in parallel). When the session expires, the scraper returns a clear "session expired" error rather than silently failing, and the cookies need to be re-pasted.

The API Call

The endpoint is asynchronous — submit a job, poll for completion, fetch the result. Three curl calls walk the full flow:

# 1. Submit
curl -G "https://api.logposervices.com/api/v1/social/facebook/scrape" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "url=https://www.facebook.com/SomeBrand" \
  --data-urlencode "limit=30" \
  --data-urlencode "account_id=fb_acct_8a3f..."
# → {"job_id": "fb_2c91..."}

# 2. Poll (or wait inline)
curl -H "X-API-Key: lp_xxxxxxx" \
  "https://api.logposervices.com/api/v1/jobs/fb_2c91?wait=true&timeout=60"

# 3. Fetch result
curl -H "X-API-Key: lp_xxxxxxx" \
  https://api.logposervices.com/api/v1/jobs/fb_2c91/result

Three parameters matter:

url — the Facebook page URL. Either the vanity form (facebook.com/SomeBrand) or the numeric form (facebook.com/profile.php?id=...) works. The scraper resolves both. Post-specific URLs (/posts/...) and watch URLs (fb.watch/...) are also accepted if a single-post scrape is what's needed.
limit — how many posts to pull, starting from the most recent. The endpoint paginates internally; a limit=30 request fetches the latest 30 posts in one job. The default is 30, which covers about two weeks of activity for a daily-posting brand.
account_id — the encrypted cookie reference from the setup step above. Without it, the job will fail with a 401-equivalent error before it even starts the scrape.

A limit=30 job typically finishes in 30–60 seconds. Larger pulls (limit=100+) scale roughly linearly and should always be polled rather than waited on inline, because the Cloudflare edge in front of the API closes connections at 100 seconds.

A Python Script That Pulls A Page Daily

This is the script most brand teams end up running on a nightly cron — submit the page, wait for completion, write the result to a date-stamped JSON file. Tomorrow's run produces another file, and the diff between them is the dashboard.

import os, time, json, requests
from datetime import date

API_KEY = os.environ["LOGPOSE_API_KEY"]
FB_ACCOUNT_ID = os.environ["LOGPOSE_FB_ACCOUNT_ID"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}


def submit_and_wait(path: str, params: dict, timeout_s: int = 180) -> dict:
    r = requests.get(f"{BASE}/{path}", params=params, headers=HEADERS, timeout=30)
    r.raise_for_status()
    job_id = r.json()["job_id"]
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
        if s["status"] == "completed":
            break
        if s["status"] == "failed":
            raise RuntimeError(s.get("error", "unknown failure"))
        time.sleep(3)
    else:
        raise TimeoutError(f"job {job_id} did not finish in {timeout_s}s")
    return requests.get(f"{BASE}/jobs/{job_id}/result", headers=HEADERS, timeout=15).json()


def snapshot_page(page_url: str, limit: int, out_dir: str) -> int:
    data = submit_and_wait(
        "social/facebook/scrape",
        {"url": page_url, "limit": limit, "account_id": FB_ACCOUNT_ID},
    )
    posts = data["posts"]
    out_path = f"{out_dir}/{date.today().isoformat()}.json"
    os.makedirs(out_dir, exist_ok=True)
    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(posts, f, ensure_ascii=False, indent=2)
    return len(posts)


if __name__ == "__main__":
    n = snapshot_page(
        "https://www.facebook.com/SomeBrand",
        limit=30,
        out_dir="snapshots/somebrand",
    )
    print(f"snapshotted {n} posts")

Run that nightly across the three to five competitor pages that matter most. The output is a directory of date-keyed JSON files, ready for the diff step.

The Daily Diff Loop

Once two snapshots exist, the interesting work begins. The diff between yesterday and today tells you four things: which posts are new, which posts have been deleted (rare but worth flagging — usually a signal of a campaign mistake), how engagement is moving on existing posts, and how the per-reaction breakdown is shifting.

import json
from pathlib import Path
from datetime import date, timedelta

def load(d: date, page: str) -> dict:
    p = Path(f"snapshots/{page}/{d.isoformat()}.json")
    if not p.exists():
        return {}
    return {p["post_id"]: p for p in json.loads(p.read_text())}


def diff_one_page(page: str, today: date = None):
    today = today or date.today()
    yesterday = today - timedelta(days=1)
    y, t = load(yesterday, page), load(today, page)

    new_posts = [t[k] for k in t if k not in y]
    removed = [y[k] for k in y if k not in t]

    velocity = []
    for k in t.keys() & y.keys():
        delta_reactions = t[k]["reactions_total"] - y[k]["reactions_total"]
        delta_comments = t[k]["comments_count"] - y[k]["comments_count"]
        if delta_reactions > 0 or delta_comments > 0:
            velocity.append({
                "post_id": k,
                "text": t[k]["text"][:120],
                "delta_reactions": delta_reactions,
                "delta_comments": delta_comments,
                "total_reactions": t[k]["reactions_total"],
            })

    return {
        "page": page,
        "new_posts": new_posts,
        "removed_posts": removed,
        "velocity": sorted(velocity, key=lambda x: x["delta_reactions"], reverse=True),
    }


if __name__ == "__main__":
    report = diff_one_page("somebrand")
    print(f"new: {len(report['new_posts'])}  removed: {len(report['removed_posts'])}")
    for v in report["velocity"][:5]:
        print(f"  +{v['delta_reactions']} reactions / +{v['delta_comments']} comments — {v['text']}")

That output, piped into a Slack channel or a weekly email digest, is the actual brand-monitoring dashboard. Strategists care about three things from it. First: cadence — is the competitor posting daily, every other day, weekly? A shift in cadence almost always precedes a campaign push. Second: creative direction — does the new-posts list cluster around a theme (sustainability, behind-the-scenes, founder content, UGC)? That tells you what the team is currently betting on. Third: velocity — which posts are gaining the most reactions per day, and what do they have in common? That is your unfiltered read on what is actually resonating with their audience, separate from what they are paying to amplify.

Reading the Reaction Breakdown

The per-reaction breakdown (like, love, wow, haha, sad, angry) is the most underused field in the response. The naive read is to look at total reactions, but the breakdown ratios carry meaningfully different signal.

A high love-to-like ratio (above 15%) signals deep emotional resonance, typically on founder content, mission-driven posts, or customer stories. A high haha-to-like ratio signals successful humor — rare for most brands, valuable when achieved. A non-zero angry count on a brand post is the canary: it usually means the post has been picked up by a hostile community (review bombing, a political backlash, an ad targeting mistake). Track angry over time and you have an early-warning system for competitor crisis moments, which is genuinely useful intelligence when planning your own messaging that week.

Cross-Referencing With The Meta Ad Library

Organic post engagement on its own can be misleading. A post with three thousand reactions on a page with two hundred thousand followers looks like a hit, until you discover the brand has been actively running it as a sponsored ad for the last fourteen days — at which point the engagement is mostly paid distribution, not organic resonance. The Meta Ad Library publishes every active ad creative for any page, and joining that data against the organic scrape changes the read significantly.

The simplest pattern: pull the page's active ads from the Ad Library (it has its own search URL per page) on the same nightly cadence, and tag each scraped post with an is_boosted flag if the post text or media URL also appears in the active-ads list. Posts that engage well without being boosted are the genuine organic wins worth studying; posts that engage well because they are being boosted tell a different story about budget allocation and creative confidence. Both are useful, but only after they are separated.

This is also where the reactions_breakdown field earns its keep. Paid distribution tends to flatten the breakdown toward like (because cold audiences default to the easiest reaction), while organic distribution to engaged followers produces a much higher love share. A post with eighty percent like and ten percent love is probably boosted; a post with sixty percent like and twenty-five percent love is probably resonating organically. That heuristic is not perfect, but it is consistent enough across categories to be useful as a tiebreaker when the boost flag is ambiguous.

Posting Time Patterns

Beyond what a competitor posts, when they post is a signal worth tracking. The timestamp field on every scraped post is the raw material. Three patterns are worth pulling from thirty days of data per page.

First, the time-of-day histogram — most consumer brands cluster posts in two or three windows (morning, midday, evening), and a shift in that distribution typically means a new social manager, a tool change, or a shift in target audience.

Second, the day-of-week histogram. Brands that post on weekends are typically running an editorial calendar with a dedicated content lead; weekday-only brands are typically running through an agency or a marketing-ops tool. Knowing which model a competitor uses tells you how nimble their content team is.

Third, the gap distribution — the time between consecutive posts. A consistent two-day gap means a planned calendar; high variance means reactive posting tied to news cycles. Reactive posters are easier to out-cadence; calendared posters require matching their rhythm to compete in the feed.

All three patterns are derivable from the same scraped JSON snapshots with a few lines of pandas.

Scaling To Multiple Competitor Pages

A single competitor page is one watcher; a brand strategist usually wants to watch five to fifteen pages — direct competitors, adjacent-category brands worth learning from, the category leader, and a few up-and-coming challengers. The submit-and-wait pattern above scales fine for that count if you sequence the calls, but for a portfolio of fifteen or more pages, bulk submission cuts the wall-clock time substantially. The bulk endpoint accepts a list of page URLs and schedules them across the available concurrency, finishing the whole portfolio in roughly the time of a single page.

The other scaling consideration is account hygiene. One cookie session per scraper is the simplest setup, but Meta's rate-limit signals are account-keyed, so heavy use of a single account on a portfolio of fifteen pages can eventually trigger soft throttling on that account. Production setups rotate across two or three connected accounts on the dashboard — the platform handles the rotation transparently once multiple accounts are connected.

Common Mistakes

Using a fresh Facebook account for the cookie session. Meta's anti-abuse systems disproportionately rate-limit accounts that are less than thirty days old. Use an account that has been active for at least six months, ideally one with a real history of logins from a stable device. The scraper will technically work with a one-day-old account, but the session will expire faster and the rate limits will bite harder.
Scraping the same page multiple times per day. There is no benefit. Facebook updates engagement counts in roughly hourly increments, and Meta's anti-abuse systems treat repeated rapid-fire scrapes of the same page as a strong bot signal. One snapshot per day per page is enough for any longitudinal analysis; two if you need finer-grained reaction velocity.
Treating reaction counts as ground truth in the first hour after publish. Reaction counts on a brand-new post lag the actual engagement by ten to thirty minutes due to Facebook's caching. A post scraped five minutes after it goes live may show zero reactions even when the live page shows fifty. Wait at least an hour from timestamp before treating the engagement numbers as accurate.
Pulling more than a hundred posts at once on the first scrape. The first call from a new session is the most likely to hit Facebook's rate limiter because Meta has not yet established a behavioral baseline for the account. Start with limit=30 for the first few jobs per session; once those have run cleanly, larger pulls are fine.
Forgetting that the Cloudflare edge in front of the API closes connections at one hundred seconds. A large limit value translates to a longer scrape; jobs above sixty seconds should always be polled with /api/v1/jobs/{job_id} rather than waited on inline with the wait=true parameter.

Legality And Brand-Safety Notes

Public Facebook page content is exactly that — public. Meta operates Facebook pages as a discovery surface, and every post on a public page is indexed by Google and surfaced to non-logged-in visitors who hit the right URL. Scraping that content for competitive monitoring is on settled legal ground in the US and broadly compliant in the EU under GDPR's legitimate-interest basis for B2B competitive intelligence, provided the data stays internal and is not republished as a competing product or used to train a customer-facing model.

The brand-safety side is worth flagging separately from the legal side. Internal stakeholders sometimes flinch at the word "scraping" because it sounds adversarial. The reframe that lands well with legal and brand teams is to call it what it is: competitive monitoring using public web data, the same input a human strategist would gather by manually checking competitor pages each morning, just automated. Frame it as time saved, not stealth, and the conversation usually settles within one meeting.

Get Started

Sign up at logposervices.com and generate an API key under Tool → API Keys.
Connect a Facebook account under Accounts → Facebook → Add account by pasting the four session cookies (c_user, xs, fr, datr).
export LOGPOSE_API_KEY=lp_xxxxxxx LOGPOSE_FB_ACCOUNT_ID=fb_acct_xxxx
Pick three competitor pages and run the Python snapshot script above against each on a nightly cron.

Related reading: How to scrape Instagram for content strategy for the matching playbook on the other Meta surface, How to find trending TikTok creators by hashtag and niche for the short-video side of competitive monitoring, and the web scraping API guide for the broader DIY-versus-managed comparison.

External: Meta Ad Library for paid-side visibility, hiQ Labs v. LinkedIn on public-data scraping precedent.

How to Scrape Facebook Page Posts for Competitor Watch

Why Brand Strategists Watch Competitor Facebook Pages

What Actually Comes Back Per Post

The API Call

A Python Script That Pulls A Page Daily

The Daily Diff Loop

Reading the Reaction Breakdown

Cross-Referencing With The Meta Ad Library

Posting Time Patterns

Scaling To Multiple Competitor Pages

Common Mistakes

Legality And Brand-Safety Notes

Get Started

Frequently asked questions

Related posts

How to Scrape Facebook Page Posts for Competitor Watch

Frequently asked questions

Related posts

How a Cold-Email Agency Pulls 500 Fresh Local Leads a Week

The Deal Scout's Weekly Funding Digest from Crunchbase

How DTC Brands Catch a Competitor's Price Drop the Same Day