The data moat

78 billion data points.
Six years.
Five platforms.

The longest-running continuous ecommerce intelligence dataset in Southeast Asia. Built on infrastructure that cracks Shopee's anti-bot defenses — and has been doing it since 2020.

78bn Data points
since 2020
10M Products
tracked
2M Merchants
tracked
18k+ Brands
tracked
6 yrs Continuous
coverage

A one-month scrape tells you what's selling today.

Six years tells you how categories move across peak seasons, how price wars resolve, how new entrants grow or fail, and which market share gains are structural versus promotional.

Magpie's taxonomy has been maintained continuously — a query about Baby Care in 2024 maps correctly to Baby Care in 2020, even through two Shopee category restructures.

Platform coverage added by year
2020
Shopee ID
2021
+ Lazada
2022
+ Tokopedia
2023
+ TikTok Shop
Now
6 countries

Bar = relative historical depth. Shopee Indonesia has the deepest archive.

Shopee is the most aggressively protected platform in SEA. We've been inside it since 2020.

Anti-bot defenses

Dynamic rate limiting, IP rotation detection, behavioural fingerprinting, and frequent API schema changes. Most competitors break within days. We've maintained continuous operation for six years.

Unbroken since 2020

No gaps. No resets. The taxonomy has been maintained through two Shopee category restructures — enabling true six-year trend analysis without manual reconciliation.

Multi-language signals

Review sentiment in Bahasa Indonesia, Filipino, Thai, and Vietnamese. How Owl detects 'palsu', 'kw', and 'peke' in counterfeit listings — real local signals, not proxies.

Stable taxonomy layer

10 million product SKUs mapped to a human-maintained taxonomy. The same FMCG category in 2020 is the same category today — enabling cross-year analysis without data science overhead.

From raw marketplace data to clean, joined dataset

Four stages. Fully automated. Running continuously since 2020.

01 · Collect
Scrape

Automated scraping across five platforms. Anti-bot handling, rate limits, proxy rotation. Monthly cycles across all markets.

02 · Clean
Clean

Prices normalised across promotional mechanics. Sold counts reconciled against baselines. Duplicates removed. Flash sale flags attached.

03 · Classify
Classify

AI-assisted taxonomy with human review. 10 million product SKUs mapped to stable categories. Cross-platform SKU matching applied.

04 · Power
Power

Joined, warehoused in BigQuery. Powers Farsight, Owl, Nest API, and Looker dashboards.

What we track, where, and how far back

Platform Countries Key fields Since Refresh
Shopee ID, MY, PH, TH, VN, SG SKU · price · stock · sold · reviews · seller · category · rank · promotions 2020 Monthly
Lazada ID, MY, PH, TH, VN, SG SKU · price · stock · sold · reviews · seller · category 2021 Monthly
TikTok Shop ID, TH, VN, PH SKU · price · sold · reviews · seller · category 2023 Monthly
Tokopedia ID SKU · price · stock · sold · reviews · seller · category 2022 Monthly
Blibli ID SKU · price · stock · sold · category 2022 Monthly

Questions about the data

Who has the best Shopee data in Southeast Asia?

Magpie IQ has operated a continuous Shopee scraping pipeline since 2020 — longer than any comparable SEA-native intelligence provider. The dataset now contains 78 billion data points across five platforms, with Indonesia as the primary and deepest market.

How does Magpie handle Shopee's anti-bot defenses? +

Through six years of continuous engineering investment — proxy management, behavioural mimicry, rate limit negotiation, and rapid response to platform changes. This is maintained as an ongoing engineering function, not a one-time build.

How accurate is the sold count data? +

Shopee's sold count updates when buyers confirm receipt — creating a lag between transaction and recorded sold. Magpie reconciles these against historical baselines and flags anomalies. Data notes surface any caveats in Farsight answers.

How do I access the dataset? +

Three access points: Farsight (natural-language AI interface), Nest API (direct REST access), or managed Looker Studio dashboards. Email sales@magpieiq.com to discuss.

78 billion data points.
Three ways to use them.

Ask questions in plain language via Farsight. Pull raw data via Nest API. Or get a managed Looker dashboard — all on the same pipeline.