How we dropped LinkedIn scraper false positives from 62% to zero
I run a LinkedIn employees scraper on Apify that had a 62% false positive problem. Here is how we fixed it.
The false positive problem
The first version was a Google dork. Query site:linkedin.com/in/ "TargetCompany" and parse the SERP. Fast, cheap, and broken.
Google indexes profiles that mention a company anywhere in the page. That includes:
- People who used to work there three years ago
- People quoted in the company's press releases
- People whose recent post mentioned the company as a competitor
A paying user ran the actor on 414 companies and got 3,128 profiles back. Manual spot check: 62% were not current employees. Unusable for outreach, because "I saw your company is hiring" to someone who left two years ago kills sender credibility.
The fix: a second verification pass
LinkedIn's public profile HTML includes a JSON-LD Person schema with a worksFor array listing the current employers.
{
"@type": "Person",
"name": "Patrick Collison",
"worksFor": [
{
"@type": "Organization",
"name": "Stripe",
"url": "https://www.linkedin.com/company/stripe/"
}
]
}
The verification step is: for every SERP candidate, fetch their public profile, parse the JSON-LD, and only emit profiles whose worksFor[].url slug matches the target company. If the match fails, the profile is dropped.
Why not use a real browser?
We tried. Puppeteer with stealth takes ~3 seconds per page on a Contabo VPS, leaves fingerprint traces Cloudflare and LinkedIn detect, and the Chromium image adds ~450 MB to the Apify actor Docker image.
Instead we run a small Go service on the VPS that uses github.com/bogdanfinn/tls-client, a Go HTTP client that reproduces Chrome's TLS handshake exactly, including the JA4 fingerprint, the HTTP/2 settings frame, and the extension order. From LinkedIn's perspective, the connection is indistinguishable from real Chrome 124.
Architecture
Apify actor (Node.js)
|
+-- Phase A: Google SERP -> Apify GOOGLE_SERP proxy -> candidate slugs
|
+-- Phase B: verify each candidate
|
+-- POST /tls/fetch on VPS (Go service)
|
+-- Apify RESIDENTIAL proxy -> LinkedIn /in/{slug}
|
+-- parse JSON-LD -> worksFor match?
|
+-- yes -> emit (confidence: high)
+-- no -> drop (counted as rejected)
The Go service exposes /tls/fetch with a session pool (persistent cookie jars per session_id) and maintains a burn ledger that cools a session for 30 minutes after any 999 block.
Detection is holistic, not just TLS
A common mistake (we made it early on) is assuming TLS fingerprint parity is enough. It is not. Modern detection combines:
- TLS fingerprint (JA3 / JA4): baseline
- HTTP/2 settings frame + header order
- User-Agent consistency with the TLS version
- IP reputation (datacenter vs residential)
- Request cadence (human-like spacing)
- Cookie and session continuity
Chrome 124 JA4 parity only solves layer 1. We also send the full Sec-Ch-Ua, Sec-Fetch-*, Upgrade-Insecure-Requests header set, lock the User-Agent to match the profile version, add 800 to 2500 ms human jitter between requests on the same session, and route through Apify residential for the IP reputation layer.
The counterintuitive insight: do NOT randomize TLS profiles to evade detection. Real Chrome does not randomize. A roller makes you more visible, not less. Stick with one current profile, perfectly.
Results
Smoke test on Stripe this morning: 5 requested, 5 verified current Stripe employees (Patrick Collison, John Collison, Juliet Simpson, JR Farr, Karl Durrance), 0 false positives. Duration about 3 minutes end to end.
Over 299 paying users hit this scraper last month. Failure rate is low single digits, all from LinkedIn 999 blocks that the burn ledger handles correctly.
Try it
Apify Store: https://apify.com/george.the.developer/linkedin-company-employees-scraper
Priced per verified profile. Input: a list of company URLs or names. Output: verified current employees only, no false positives.


