Is it legal to use creators' public profile and post data for a shortlist?

The follower counts, like counts, comment counts, captions, and post timestamps you read off a public Instagram or TikTok profile are displayed without authentication to anyone who opens the page. Scraping public web data is not a CFAA violation in the United States, per hiQ Labs v. LinkedIn (9th Cir. 2022), and EU/UK precedent treats public engagement metrics on a business-facing profile as collectible. The key distinction for this pipeline: you are collecting public engagement data — followers, likes, comments on posts the creator chose to make public — not private contact information, DMs, or anything behind a login. There are no email addresses or phone numbers in this dataset, because the platforms do not publish them on the post objects you are reading. The regulated step is downstream: how you contact a creator (CAN-SPAM, GDPR, the platform's own DM rules) is governed separately from how you assessed their engagement, and that is where the compliance work actually lives.

Why compute engagement rate yourself instead of trusting the follower count?

Follower count is the single most gameable metric on either platform — it is bought, inflated by giveaway loops, and inherited from a one-off viral moment that no longer reflects current reach. A creator with 80k followers and 300 likes per post is worth less to a campaign than one with 18k followers and 2,500 likes per post, but a follower-sorted list ranks them in exactly the wrong order. Engagement rate — recent likes plus comments divided by followers — is the closest public proxy for whether an audience actually sees and reacts to what the creator posts. You compute it from the most recent posts specifically because it is a current signal: an ER averaged over the last dozen posts catches a creator whose audience has gone cold, and a follower count never will.

← Back to blogStrategy

Building a Niche Creator Shortlist Without Paying for a SaaS Seat

June 23, 2026 · 12 min read

If you are putting together a creator shortlist for a niche campaign — a skincare launch, a tabletop-gaming Kickstarter, a regional coffee brand — the part that actually decides whether the campaign works is selection, and selection happens before any outreach. The standard move is to buy a seat in an influencer-discovery SaaS, search a niche, and sort by follower count. But follower count is the one metric you should trust least, and the per-seat pricing assumes you are running this search every week when most brands run it a few times a quarter.

This guide builds the same shortlist from public data: start from a niche hashtag on both Instagram and TikTok, collect the creators posting under it, pull each one's profile and most recent posts, compute a real engagement rate from those posts, filter to a follower band and an ER threshold, rank, and write a clean shortlist CSV. The worked example is a niche skincare campaign targeting the micro tier (10k–100k followers), but the same code covers any niche on either platform by swapping the hashtags and the follower band.

Why Follower Count Is the Wrong Sort Key

The output every brief asks for is "find me creators in this niche." The output that wins campaigns is "find me creators in this niche whose audience actually engages." Those are different lists, and the gap between them is engagement rate.

Engagement rate is the public proxy for reach quality: how many people like and comment relative to how many follow. The formula is simple —

ER = (avg_likes + avg_comments) / followers

— but the inputs are what make it honest. You want the average over the creator's recent posts, not a lifetime number and not a single pinned hit. A creator who went viral two years ago and coasted on the follower count will have a high count and a collapsed recent ER; a rising micro-creator will have a modest count and an ER several times higher. A follower-sorted list buries the second creator under the first, which is exactly backwards for a campaign that needs current reach, not historical prestige.

So the pipeline is not "search a niche and take the top N by followers." It is: discover candidates from a niche hashtag, pull each candidate's profile and recent posts, compute ER from those posts, and then rank — with the follower count used only as a band filter (to stay in the micro tier), never as the sort key.

Step 1: Discover Candidates from a Niche Hashtag

Discovery starts from the hashtag your target audience already follows. Both platforms expose recent and top posts under a tag publicly, and every one of those posts carries the author's username — that is your candidate pool.

On TikTok, the hashtag search endpoint takes the tag name and a result limit. On Instagram, the hashtags endpoint takes the tag and returns recent/top posts you can read usernames out of. Confirm each one with curl before scripting:

# TikTok hashtag discovery — returns recent posts under the tag
curl -G "https://api.logposervices.com/api/v1/social/tiktok/search" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "tagname=skincareroutine" \
  --data-urlencode "limit=100"
# → {"job_id": "tt_4c1d...", "status": "pending"}

# Instagram hashtag discovery — returns recent/top posts under the tag
curl -G "https://api.logposervices.com/api/v1/social/insta/hashtags" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "tag=skincareroutine"
# → {"job_id": "ig_9a02...", "status": "pending"}

Both calls are asynchronous: each returns the job id immediately, and you poll for the result rather than waiting on the connection. That matters because api.logposervices.com sits behind Cloudflare, which kills any single connection at roughly 90 seconds — a hashtag pull that surfaces a few hundred posts can run past that, so you submit and poll instead of blocking.

Pick two or three hashtags per niche, not one. A single tag skews toward whatever format dominates it; skincareroutine, skintok, and a product-specific tag together give you a broader, less format-biased candidate pool. Cast wide here — the vetting in later steps is what tightens the list, so a noisy discovery pass is fine and a narrow one quietly costs you good creators.

Step 2: Collect Candidate Usernames

Discovery returns posts, but what you want is the distinct set of authors. The same creator often posts several times under a popular tag, and across two or three tags the overlap is real — so collect into a set keyed on username, and remember which platform each came from.

import os, time, requests

API_KEY = os.environ["LOGPOSE_API_KEY"]
BASE = "https://api.logposervices.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}


def submit(path, params):
    r = requests.get(f"{BASE}{path}", params=params, headers=HEADERS, timeout=30)
    r.raise_for_status()
    return r.json()["job_id"]


def wait(job_id, poll_every=5, timeout_s=180):
    """Poll one job id until it finishes; return the result payload."""
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        s = requests.get(f"{BASE}/jobs/{job_id}", headers=HEADERS, timeout=15).json()
        status = s.get("status")
        if status == "completed":
            return requests.get(f"{BASE}/jobs/{job_id}/result",
                                headers=HEADERS, timeout=30).json()
        if status == "failed":
            print(f"  job {job_id} failed: {s.get('error')}")
            return {}
        time.sleep(poll_every)
    print(f"  job {job_id} timed out")
    return {}


def discover(tiktok_tags, insta_tags, per_tag=100):
    """Return {username: platform} for every author found under the tags."""
    candidates = {}
    for tag in tiktok_tags:
        jid = submit("/social/tiktok/search",
                     {"tagname": tag, "limit": per_tag})
        for post in wait(jid).get("posts", []):
            user = post.get("author") or post.get("username")
            if user:
                candidates.setdefault(user, "tiktok")
    for tag in insta_tags:
        jid = submit("/social/insta/hashtags", {"tag": tag})
        for post in wait(jid).get("posts", []):
            user = post.get("owner_username") or post.get("username")
            if user:
                candidates.setdefault(user, "instagram")
    return candidates


candidates = discover(
    tiktok_tags=["skincareroutine", "skintok"],
    insta_tags=["skincareroutine", "skincaretips"],
)
print(f"{len(candidates)} distinct candidate creators")
# e.g. ~400 posts across 4 tags -> ~180 distinct creators

setdefault does the dedupe for free — the first time a username appears it is recorded with its platform, and every later appearance is ignored. A creator who shows up on both platforms is kept once on whichever surfaced first; if cross-platform presence matters to your brief, key on (username, platform) instead and you will carry both rows through.

Step 3: Pull Each Creator's Profile and Recent Posts

This is the step that turns a username into a vettable row. For each candidate you need two things the discovery feed does not give you: the follower count (to apply the band filter) and the recent posts with their like and comment counts (to compute ER). The endpoints differ by platform.

On Instagram, profile_summary returns the follower count and profile firmographics, and posts returns recent posts with engagement counts. On TikTok, deep_profile returns both the follower count and recent posts in one call. Confirm the Instagram pair with curl:

# Instagram profile summary — follower count + firmographics
curl -G "https://api.logposervices.com/api/v1/social/insta/profile_summary" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "username=some_creator"
# → {"job_id": "ig_2b7e...", "status": "pending"}

# Instagram recent posts — likes + comments per post
curl -G "https://api.logposervices.com/api/v1/social/insta/posts" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "username=some_creator" \
  --data-urlencode "limit=12"
# → {"job_id": "ig_2c40...", "status": "pending"}

# TikTok deep profile — follower count + recent posts in one call
curl -G "https://api.logposervices.com/api/v1/social/tiktok/deep_profile" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "username=some_creator" \
  --data-urlencode "limit=12"
# → {"job_id": "tt_5d18...", "status": "pending"}

limit=12 is a deliberate choice: a dozen recent posts is enough to average out a single anomalous hit or flop while staying recent enough that the ER reflects the audience now. Going much higher pulls in older posts that drag the average toward stale engagement; going much lower lets one viral post distort the whole number.

A pull per creator across ~180 candidates is a lot of jobs, so the right pattern is the same fire-all-then-poll shape as any batch: submit every job up front, each returns a job id instantly, then drain the queue. Below, each creator is one logical fetch that submits its platform's jobs and collects them.

def fetch_creator(username, platform):
    """Return (followers, posts) for one creator, or (None, []) on failure."""
    if platform == "tiktok":
        res = wait(submit("/social/tiktok/deep_profile",
                          {"username": username, "limit": 12}))
        prof = res.get("profile", res)
        followers = prof.get("followers") or prof.get("follower_count")
        posts = res.get("posts", [])
    else:
        summary = wait(submit("/social/insta/profile_summary",
                              {"username": username}))
        followers = summary.get("followers") or summary.get("follower_count")
        posts = wait(submit("/social/insta/posts",
                            {"username": username, "limit": 12})).get("posts", [])
    return followers, posts

If a profile is private, deleted, or rate-limited at fetch time, treat it as a clean miss — return no followers and no posts, and let the scoring step drop it rather than guessing. A shortlist with 150 well-vetted creators beats one with 180 where 30 rows have invented numbers.

Step 4: Compute Engagement Rate from Recent Posts

Now the rows become comparable. For each creator with a follower count and at least a few recent posts, average the likes and comments across those posts and divide by followers. Skip creators whose posts you could not read — an ER computed from zero posts is not a low score, it is no score, and the two must not be conflated.

def engagement_rate(followers, posts):
    """ER = (avg_likes + avg_comments) / followers, from recent posts."""
    usable = [p for p in posts
              if p.get("likes") is not None or p.get("comments") is not None]
    if not followers or len(usable) < 3:
        return None  # not enough signal to score honestly
    avg_likes = sum(p.get("likes") or 0 for p in usable) / len(usable)
    avg_comments = sum(p.get("comments") or 0 for p in usable) / len(usable)
    return (avg_likes + avg_comments) / followers


def score_candidates(candidates):
    scored = []
    for username, platform in candidates.items():
        followers, posts = fetch_creator(username, platform)
        er = engagement_rate(followers, posts)
        if er is None:
            continue
        scored.append({
            "username": username,
            "platform": platform,
            "followers": followers,
            "posts_sampled": len([p for p in posts
                                  if p.get("likes") is not None]),
            "engagement_rate": round(er, 4),
        })
    return scored


scored = score_candidates(candidates)
print(f"{len(scored)} creators scored with a real ER")

The len(usable) < 3 guard is the honesty check: a creator with one readable post might post a 12% ER off a single lucky video, and that is noise, not signal. Requiring at least three sampled posts before assigning a score keeps a fluke from topping your ranking — and quietly drops the brand-new accounts that have not posted enough to be assessable yet, which is the correct call for a campaign shortlist.

Step 5: Filter to a Band and Rank

With a real ER on every row, the filter is two clean cuts and a sort. Keep creators inside your follower band — the micro tier, 10k–100k here — and above an ER floor, then sort by ER descending. The follower count is now doing the one job it is good at, defining a tier, and ER is doing the ranking.

def shortlist(scored, min_followers=10_000, max_followers=100_000,
              min_er=0.03):
    """Keep micro-tier creators above the ER floor, ranked by ER."""
    kept = [c for c in scored
            if min_followers <= (c["followers"] or 0) <= max_followers
            and c["engagement_rate"] >= min_er]
    return sorted(kept, key=lambda c: c["engagement_rate"], reverse=True)


ranked = shortlist(scored, min_followers=10_000, max_followers=100_000,
                   min_er=0.03)
print(f"{len(ranked)} creators in the micro tier above a 3% ER floor")

The numbers are levers, not gospel. A 3% ER floor is a reasonable micro-tier baseline, but engagement norms differ by platform and niche — TikTok ERs tend to run higher than Instagram's, and a tight, high-intent niche can sustain ERs that would look implausible in a broad one. Run the ranking once unfiltered, eyeball the ER distribution your niche actually produces, then set the floor where the curve bends. Setting it from a generic benchmark instead of your own data either over-cuts or under-cuts every time.

Step 6: Write the Shortlist CSV

The last step turns the ranked list into a CSV your team or your campaign tool can open directly — one row per creator, sorted best-first, with a rank column so the ordering survives a re-sort in a spreadsheet.

import csv


def write_csv(ranked, out_path):
    fields = ["rank", "username", "platform", "followers",
              "engagement_rate", "engagement_pct", "posts_sampled", "profile_url"]
    with open(out_path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
        w.writeheader()
        for i, c in enumerate(ranked, start=1):
            if c["platform"] == "tiktok":
                url = f"https://www.tiktok.com/@{c['username']}"
            else:
                url = f"https://www.instagram.com/{c['username']}"
            w.writerow({
                "rank": i,
                "username": c["username"],
                "platform": c["platform"],
                "followers": c["followers"],
                "engagement_rate": c["engagement_rate"],
                "engagement_pct": f"{c['engagement_rate'] * 100:.1f}%",
                "posts_sampled": c["posts_sampled"],
                "profile_url": url,
            })
    return len(ranked)


n = write_csv(ranked, "skincare_micro_shortlist.csv")
print(f"wrote {n} ranked creators to the shortlist")

Two columns earn their place. engagement_pct is the human-readable version your campaign lead will actually scan, while the raw engagement_rate stays for sorting and thresholds. And posts_sampled is the trust column: a creator scored off twelve posts is a firmer pick than one scored off the minimum three, and surfacing that lets a reviewer weight the borderline rows by hand. The result is a tight, deduped, engagement-vetted shortlist — ranked by the metric that actually predicts campaign reach, ready to hand to whoever runs outreach.

Scaling Across Niches and Refreshing on a Cadence

The pipeline above is one niche, one pass. The real influencer-marketing shape is several niches refreshed over time — because the entire value of a micro-creator shortlist is catching creators while they are still micro. A creator at 30k followers and a 9% ER today is a creator at 120k and a market rate in six months; the shortlist is only worth what it was the day you ran it, and a stale one quietly hands the rising creators to whoever re-ran more recently.

Two things make recurring refresh practical. First, the niche is just data — the hashtag lists and follower band are parameters, so running a new niche is the same discover / score_candidates / shortlist / write_csv functions with different strings. Second, when the same niche runs on a cadence, what you care about is the diff: which creators are newly above your ER floor, and which rising stars cleared the band since last time. The differentiator that makes this a standing pipeline rather than a one-off is the combination of scheduled re-runs and export — re-pull the niche on a schedule, export each cycle's shortlist, and diff against the last. LogPose can run the discovery and profile jobs on a schedule and export the results, so the refresh is a recurring job rather than something you babysit; it also exposes a monitor primitive (with email, webhook, Telegram, Slack, or Discord notifications) if you want a ping when a watched creator crosses a threshold, but for shortlist-building the scheduled re-run plus export is the piece that matters, not the alert.

The Honest Fit

This approach fits when you are sourcing and vetting micro and mid-tier creators for a niche campaign from public engagement signals, and you want a ranked, ER-vetted shortlist without paying for a per-seat discovery SaaS you only open a few times a quarter. The hashtag-discovery entry point, the recent-posts ER computation, and the band-plus-floor ranking are the three primitives that make the shortlist reflect current reach rather than gameable follower counts.

Where it is not the right tool: this is discovery and vetting, not a managed creator CRM. It does not give you verified creator emails, audience-demographics breakdowns (age, gender, location of the followers), brand-safety scores, or rate cards — that data is not public on the post objects you are reading, and a platform like Modash or an agency roster is built to supply it. And it is not outreach automation: the CSV is the input to your outreach, not the outreach itself. For sourcing and ranking from public engagement, though, this is exactly the right trade — you control the ER definition, the band, and the cadence, instead of trusting a vendor's opaque "influencer score."

Get Started

Sign up at logposervices.com and generate an API key under Tool → API Keys.
export LOGPOSE_API_KEY=lp_xxxxxxx
Confirm discovery on one hashtag, then run the pipeline:

curl -G "https://api.logposervices.com/api/v1/social/tiktok/search" \
  -H "X-API-Key: lp_xxxxxxx" \
  --data-urlencode "tagname=skincareroutine" \
  --data-urlencode "limit=100"

Then run discover over two or three niche hashtags per platform, fetch each candidate's profile and recent posts, compute the engagement rate, filter to your follower band and ER floor, and write the ranked shortlist CSV.

Related reading: Find trending TikTok creators by hashtag in your niche for the discovery step in depth, A practical Instagram scraping guide for the profile and posts fundamentals, and Modash alternatives: build your own influencer database for the tooling landscape.

External: Instagram, TikTok, hiQ Labs v. LinkedIn.

Building a Niche Creator Shortlist Without Paying for a SaaS Seat

Why Follower Count Is the Wrong Sort Key

Step 1: Discover Candidates from a Niche Hashtag

Step 2: Collect Candidate Usernames

Step 3: Pull Each Creator's Profile and Recent Posts

Step 4: Compute Engagement Rate from Recent Posts

Step 5: Filter to a Band and Rank

Step 6: Write the Shortlist CSV

Scaling Across Niches and Refreshing on a Cadence

The Honest Fit

Get Started

Frequently asked questions

Related posts

Building a Niche Creator Shortlist Without Paying for a SaaS Seat

Frequently asked questions

Related posts

HypeAuditor Alternatives for Vetting Influencers by Real Engagement

Modash Alternatives: Building Your Own Influencer Database

Social Blade Alternatives for Tracking Creator Stats Programmatically