RhinoSpider: DePIN on ICP for Data Indexing for AI

rhinospider · August 16, 2025, 7:36am

RhinoSpider: User-Owned Web Data for AI

Accelerated by Quantum Leap Labs, RhinoSpider is a fully on-chain network that turns unused internet bandwidth and idle computing power into a resource marketplace for AI. Anyone with a computer can earn tokens by sharing bandwidth through our Chrome browser extension.

Install our browser extension now to join the movement (you just need Internet Identity).

Two sides of the platform:

Data supply: Individuals worldwide earn rewards by sharing surplus bandwidth or computing.
Data demand: AI labs, enterprises, and developers who need fresh, real-time, public web data at scale.

Real-time web data indexing:

AI models need massive volumes of up-to-the-minute, contextualized public web data. Today, access to this data is controlled by a few centralized players who often tap into residential connections without compensation or transparency. This allows for unfair value capture, data monopolies, and quality issues such as data poisoning.

Our solution:

RhinoSpider creates a user-owned, internet-scale web crawl. Contributors run lightweight nodes that sell unused bandwidth for real-time web indexing and scraping. Enterprises access this network for data ingestion, live context retrieval, and AI model training, benefiting from lower costs, broader geographic reach, and improved data integrity.

What makes us different:

Combines bandwidth sharing + computational mining in one platform. Right now, we’ve started with bandwidth sharing.
Focus on live context retrieval, not just raw scraping, thereby providing AI models with the most valuable and freshest data.
Full transparency on how user-contributed data is used.
Built on ICP for speed, scalability, and verifiable on-chain stats via our “RhinoScan” explorer.
RhinoSpider’s building a fairer, more transparent data economy where everyday internet users share in the value they help create.

How RhinoSpider Beats Web2 Substitutes:

Web2 data collection platforms, whether commercial web scrapers, cloud compute providers, or enterprise data vendors. operate under centralized control. They monopolize data access, hide sourcing methods, and capture all the economic value, leaving contributors (end users) uncompensated and without visibility into where or how their resources are used.

How we’re better:

User ownership + fair compensation: In Web2, your resources (bandwidth) may be harvested silently or undervalued. In RhinoSpider, every contribution is verifiable on-chain and rewarded transparently through tokenized incentives.
Transparency + traceability: Web2 data providers offer no proof of data lineage. RhinoSpider’s upcoming Sovereign Data Rollup will ensure contributors and enterprises can trace exactly where data came from and how it was used, reducing risks of data poisoning.
Censorship resistance: Centralized scraping services can be shut down or blocked at the provider level. RhinoSpider’s globally distributed, residential-IP network is harder to block, more resilient to takedowns, and jurisdiction-agnostic.
Decentralized marketplace: Instead of being locked into one vendor’s pricing and policies, AI labs and developers can lease resources from a competitive, open marketplace where pricing adjusts dynamically based on demand and supply.
Dual-resource model: Unlike Web2 companies that specialize in either bandwidth or compute, RhinoSpider will unify both, allowing seamless scaling from passive bandwidth sharing to full computational mining for AI workloads.

TL;DR: Web2 models treat contributors as invisible infrastructure. RhinoSpider turns them into stakeholders with transparency, control, and a share in the upside.

Architecture: How RhinoSpider is built

1. Canisters

Storage canister: hhaip-uiaaa-aaaao-a4khq-cai (stores indexed data and global network metrics.)
Consumer canister: t3pjp-kqaaa-aaaao-a4ooq-cai (serves tasks to SpiderNodes and handles execution logic.)
Admin backend Canister: wvset-niaaa-aaaao-a4osa-cai (used for task creation, configuration, and on-chain routing rules.)
Admin frontend Canister: sxsvc-aqaaa-aaaaj-az4ta-cai (hosts the React dashboard UI for admin control.)
Auth canister: rdmx6-jaaaa-aaaaa-aadq-cai (powers secure web3-based login and wallet integration.)

2. Client + web components

Browser extension: Built in JavaScript/TypeScript, tailored for Chrome and Chromium-based browsers. Uses ICP agent.js to fetch tasks from the Consumer Canister and route task results to the Storage Canister. Background scripts manage scraping workloads, locally cache minimal metadata, and ensure minimal UI friction. Authentication via Internet Identity, connecting through the Auth canister for seamless onboarding.
Web dashboard: Developed using React.js + TypeScript, this dashboard connects with ICP canisters to show: contribution stats and reward points, and admin views via the admin frontend canister are for task creation and routing configuration.

3. GitHub insights

“rhinospider_extension” handles the extension’s logic, communication with canisters, and reward tracking and “webapp_react” manages the user-facing dashboard, state management, and wallet connectivity, ensuring clean layering. Our clear code structure makes it easy for contributors to dive in, customize behavior, or build on new features (e.g., adding compute-node support later). We’ve also used of standard front-end patterns. Overall, our codebase is ready for open contributions: clear README, modular pattern, obvious extension points.

4. Design decisions

Bandwidth-first launch: The lightweight extension lowers technical barriers to entry, requiring no CPU cycles or downloads, making early user acquisition easier.
On-chain smart task routing: Admins define geolocation filters, task splits, and randomization rules via the Admin backend canister. Everything is transparent, and execution logic lives on-chain.
Privacy-first: Storing no personal data: only IP-based geolocation for task matching. Identity remains pseudonymous via Internet Identity.
Future-ready modularity: The stack is designed to extend beyond bandwidth sharing to support our future development plans.

Architectural diagram:

Internet Computer superpowers:

Everything we’ve built runs on ICP canisters: frontends, admin, routing, rewards, and storage, and user onboarding is via Internet Identity. That combo gave us a Web2-smooth UX with Web3 guarantees. Here’s what specifically moved the needle for RhinoSpider:

What ICP made easier for us:

Single-stack hosting on canisters: Admin frontend (sxsvc-aqaaa-aaaaj-az4ta-cai), admin backend (wvset-niaaa-aaaao-a4osa-cai), consumer (t3pjp-kqaaa-aaaao-a4ooq-cai), and storage (hhaip-uiaaa-aaaao-a4khq-cai) are all canisters. We serve the web app directly from an asset canister, then hit our logic canisters over the same trust boundary.
Internet identity for 1-click sign-in, no seed phrases: Passwordless, phishing-resistant auth that works.
Reverse-gas UX (cycles): We, not end users, pay for execution. That means no “approve/spend gas” popups, which keeps the experience as simple as installing an extension and clicking start.
Verified data for RhinoScan: We expose variables directly from the Storage canister for metrics like Total Nodes, Total Data Volume Indexed, and Countries/Regions.
Canister-to-canister routing: Admin defines tasks; the Router/Consumer canister assigns them using on-chain rules (geo filters, randomization, % splits). The whole assignment path is verifiable and replayable, creating an audit trail by design.

Other benefits of ICP to RhinoSpider:

Everything on canisters, not clouds, meaning fewer moving parts and a single security model. When we say “verifiable,” we point to the canister state, not a private database.
On-chain routing, off-chain heavy bytes: We store hashes, manifests, and rollup pointers on-chain; we chunk and compress payloads in the Storage canister.
Transparent rewards
Progressive upgrades: We started with bandwidth-only via the browser extension; the same canister mesh could cleanly extend to compute nodes with minimal or no architectural rework.

What this means for our users and customers:

For our users (contributors): one-tap login, start earning, see verifiable stats. No wallets or gas.
For data buyers: signed, provenance-rich metrics and deterministic task routing: cleaner compliance and less vendor risk.

ICP lets us host the whole product as code that users can verify, keep the UX lightweight with Internet Identity, and still ship a real-time experience. This helps us make a user-owned web crawl feel effortless.

Go-To-Market strategy:

Our go-to-market is ICP-first, low-cost, and fast-learning: we start by activating extension downloads through a referral blitz with ICP Hub leaders, where each gets a unique code, a public leaderboard, and a plug-and-play promo kit so that local communities can rally and compete. In parallel, we list RhinoSpider on airdrop calendars and quest platforms with proof-of-run tasks (install, run X hours, invite friends), launch a Tweet-native ICP Blink for marketing activation inside Twitter.

RhinoScan’s live map becomes our billboard: as node counts and indexed data tick up, we trigger geo-gated multipliers, short referral races, streak bonuses, and “power hours” to nudge supply where our growth demands it. The path to the first 10k users (in ~60–90 days) leans almost entirely on referrals, quests, and ambassadors; we watch CAC, retention, and uptime, then double down on what converts.

The next leap from 10k to 100k repeats the winners and layers new loops. At this stage, our goal is simple: keep friction low, make progress visible, reward quality, and let the community’s competitive spirit do the heavy lifting.

Monetization:

We’re building RhinoSpider as a sustainable, profit-driven network that also pays its contributors. We monetize in three primary ways:

Resource marketplace: Enterprises subscribe to real-time, geo-targeted web access from our extensions i.e., SpiderNodes (priced by GB, successful fetch, domain class, freshness SLA, and region). Pricing here would be for bandwidth leasing plus datasets.
Enterprise APIs and access to our Sovereign Data Rollup: Clean, contextualized, provenance-rich streams billed by API calls/rows/GB with premium add-ons like geofencing, guaranteed freshness windows, and compliance reporting.
Our pipeline feature of compute leasing will also open another revenue stream once launched. Here, we’d price by minutes.

Besides the general resource marketplace, we plan to add a layer of sponsored data quests where the data buyers would post bounties, and we’d earn a protocol fee. Every quest would pay the network, with the majority of the revenue going to the node operators (extension users) and a transparent protocol fee accruing to the treasury. Because all logic and metering run in ICP canisters, we can expose verifiable usage stats.

Governance will eventually decentralize. After initial traction (quantified by extension download count), we plan to propose an SNS, so the community can control the fee curve, reward weights, data-quality policies, treasury grants, and upgrade keys for the admin canisters. The token (post-compliance and community approval) would power governance, serve as the incentive layer, and offer fee rebates to buyers who settle in the ecosystem. All revenues earned by the protocol will be reinvested in the token through repurchases.

Current status of RhinoSpider:

MVP is complete with a working Chrome extension and admin panel; we’re pre-launch with no public users yet, and we’re moving into a tightly scoped private beta to validate activation and retention before opening the funnel. The Chrome extension and admin panel are live in our internal environment. We’re pre-launch, so user traction is pending.

What’s working (technical):

Chrome extension: Core flows implemented. Background service worker + content scripts wired. Minimal permissions.
Admin panel: Auth in place, role-based access ready.
Backend: Stable endpoints, input validation, error handling, and rate limits.
Data layer: Structured storage with clear schemas.

We’ve built the end-to-end path: extension → API → admin panel, and cut non-critical features to speed launch. Public traction hasn’t started yet.

Today we’re tracking internal metrics:

Build pass rate, test coverage snapshots, and error budget.
Extension install success rate in internal tests.
Median response times for key endpoints.
Crash-free sessions during dog-fooding.
Bug backlog burn-down per week.

Resources:

Website: rhinospider(dot)com

Twitter: rhino_spider

Github: Rhinospider

Future Plans:

Our near-term plans involve scaling user count with achievable milestones, upgrading our platform to release some pipeline features like the Sovereign Data Pipeline and getting to SNS.

More near-term, we will build on the finished Chrome extension and admin panel and focus on the moments that win users. Go-to-market moves in lockstep: a private beta of 100 targeted users, fast message tests (three value props, three headlines, three paywall frames), and a visible referral hook. We line up partner channels (global ICP Hubs). We aim for 35%+ install-to-activation in 24 hours, D7 ≥ 25% and D30 ≥ 15%, p95 API under 500 ms, crash-free sessions at 99.5%+. We also plan to showcase RhinoSpider at Token2049 and other relevant events.

Install our Chrome extension and join the largest web crawl on the internet!

marc0olo · August 20, 2025, 3:50pm

Also sharing the product video here for everybody interested in this project

codecustard · August 20, 2025, 4:15pm

Thoughts on rolling out with II 2.0 out the gate?

rhinospider · August 25, 2025, 8:46am

exploring it now! Will likely roll out in the next sprint with a couple of other feature updates.

rhinospider · August 27, 2025, 10:42am

v2 for Internet Identity is pushed live on RhinoSpider along with a few other FE updates. Will reflect on the extension in ~ 24 hours!

Topic		Replies	Views
Is ICP a pipe dream? Developers Discussing	40	924	March 15, 2025
DataPond.ai - a platform for securing and tracking data used by AI Showcase	10	368	December 27, 2024
Building planet scale apps on ICP Developers Discussing	10	449	October 31, 2025
Technical Working Group DeAI Developers	413	16647	March 17, 2026
ICP.Lab Storage & Scalability Summaries Developers	18	4827	April 9, 2025

RhinoSpider: DePIN on ICP for Data Indexing for AI

Related topics