The data moat

78 billion data points.
Six years.
Five platforms.

The longest-running continuous ecommerce intelligence dataset in Southeast Asia. Built on infrastructure that cracks Shopee's anti-bot defenses — and has been doing it since 2020.

78bn Data points
since 2020
10M Products
tracked
2M Merchants
tracked
18k+ Brands
tracked
6 yrs Continuous
coverage

A one-month scrape tells you what's selling today.

Six years tells you how categories move across peak seasons, how price wars resolve, how new entrants grow or fail, and which market share gains are structural versus promotional.

Magpie's taxonomy has been maintained continuously — a query about Baby Care in 2024 maps correctly to Baby Care in 2020, even through two Shopee category restructures.

Platform coverage added by year
2020
Shopee ID
Tokopedia
2021
+ Blibli
Shopee/Lazada SEA
2022
+ Lazada ID
2025
+ TikTok Shop
Now
6 countries

Bar = relative historical depth. Shopee Indonesia has the deepest archive.

Shopee is the most aggressively protected platform in SEA. We've been inside it since 2020.

Anti-bot defenses

Dynamic rate limiting, IP rotation detection, behavioural fingerprinting, and frequent API schema changes. Most competitors break within days. We've maintained continuous operation for six years.

Unbroken since 2020

No gaps. No resets. The taxonomy has been maintained through two Shopee category restructures — enabling true six-year trend analysis without manual reconciliation.

Multi-language signals

Review sentiment in Bahasa Indonesia, Filipino, Thai, and Vietnamese. How Owl detects 'palsu', 'kw', and 'peke' in counterfeit listings — real local signals, not proxies.

Stable taxonomy layer

10 million product SKUs mapped to a human-maintained taxonomy. The same FMCG category in 2020 is the same category today — enabling cross-year analysis without data science overhead.

From raw marketplace data to clean, joined dataset

Four stages. Fully automated. Running continuously since 2020.

01 · Collect
Scrape

Automated scraping across five platforms. Anti-bot handling, rate limits, proxy rotation. Monthly cycles across all markets.

02 · Clean
Clean

Prices normalised across promotional mechanics. Sold counts reconciled against baselines. Duplicates removed. Flash sale flags attached.

03 · Classify
Classify

AI-assisted taxonomy with human review. 10 million product SKUs mapped to stable categories. Cross-platform SKU matching applied.

04 · Power
Power

Joined, warehoused in BigQuery. Powers Farsight, Owl, Nest API, and Looker dashboards.

What we track, where, and how far back

Platform Country Beauty Electronics F&B Groceries Health Household Mom & Baby Pet Tobacco
Shopee Indonesia 2021-01 2020-10 2021-06 2021-06 2020-10 2021-05 2020-03 2022-01 2023-01
Malaysia 2021-08 2025-09 2025-09 2025-09 2025-09
Philippines 2021-08 2025-09 2025-09 2025-09 2025-09
Singapore 2025-06 2021-08 2025-09 2025-09 2025-06 2025-06 2025-06 2025-06
Thailand 2025-06 2021-08 2025-09 2025-09 2025-06 2025-06 2025-06 2025-06
Vietnam 2021-08 2025-09 2025-09 2025-09
Lazada Indonesia 2023-03 2023-04 2023-04 2023-04 2023-04 2022-04 2023-01 2022-02 2023-01
MY, PH, SG, TH, VN 2021-08
Tokopedia Indonesia 2021-02 2020-04 2021-06 2021-06 2020-10 2021-05 2020-05 2022-01 2023-01
Blibli Indonesia 2022-04 2021-03 2022-01 2022-01 2022-02 2022-01 2021-03 2022-02 2023-01
TikTok Shop Indonesia 2025-03 2025-07 2025-10 2025-03 2025-02

Dates show the first month of available data (YYYY-MM). — means that category is not yet available for that marketplace and country, or the platform does not operate there. All platforms refresh monthly, with higher-frequency capture for actively monitored categories. Fields captured per SKU include price, stock, units sold, reviews, seller type, category, rank, and promotions (varies by platform).

Explore the data by market

Free monthly snapshots from the universe — market share, GMV trends and category movers. Exact figures and full history are in Farsight.

Questions about the data

Who has the best Shopee data in Southeast Asia?

Magpie IQ has operated a continuous Shopee scraping pipeline since 2020 — longer than any comparable SEA-native intelligence provider. The dataset now contains 78 billion data points across five platforms, with Indonesia as the primary and deepest market.

How does Magpie handle Shopee's anti-bot defenses? +

Through six years of continuous engineering investment — proxy management, behavioural mimicry, rate limit negotiation, and rapid response to platform changes. This is maintained as an ongoing engineering function, not a one-time build.

How accurate is the sold count data? +

Shopee's sold count updates when buyers confirm receipt — creating a lag between transaction and recorded sold. Magpie reconciles these against historical baselines and flags anomalies. Data notes surface any caveats in Farsight answers.

How do I access the dataset? +

Three access points: Farsight (natural-language AI interface), Nest API (direct REST access), or managed Looker Studio dashboards. Email sales@magpieiq.com to discuss.

78 billion data points.
Three ways to use them.

Ask questions in plain language via Farsight. Pull raw data via Nest API. Or get a managed Looker dashboard — all on the same pipeline.