AI and data surge, underlying infrastructure becomes the biggest obstacle, who will break the deadlock?

2025-07-10 20:00:05

Collection

Why Data Becomes the Most Valuable Resource, and Who Can Truly Unlock It Will Define the Next Decade

The history of technological development often features nodes where the speed of innovation far exceeds the capacity of infrastructure. We have experienced the overwhelmed Web1 era of dial-up internet, witnessed how video streaming quickly replaced traditional cable TV, and seen cloud computing completely disrupt software deployment and development methods.

Today, this phenomenon of infrastructure lag is happening again. And this time, the protagonist is data.

Data: A Trillion-Dollar Asset That Remains Asleep

From AI to IP, and various Web3 applications, data is gradually becoming the core resource driving the global economy. It is an asset class, a means of production, and a new form of economic organization. Morgan Stanley predicts that by 2032, the market for high-quality AI training data will exceed $17 billion; meanwhile, the overall data market has already surpassed $3 trillion.

Ironically, most of this enormous value remains dormant today:

Trapped in closed platforms, unable to be discovered and utilized;
Dispersed across systems with different structures, making it impossible to combine or reuse;
Lacking effective market mechanisms for authorization, pricing, and circulation.

This is akin to the oil resources of the early 20th century, where there was clearly gold everywhere, yet no refineries, gas stations, or logistics networks to transform it into circulating economic value.

The Bottleneck of AI Lies Not in Algorithms, But in Data

Today, AI models have an increasing demand for high-quality structured data. However, the most valuable data resources are controlled by a handful of tech giants. Approximately 95% of training data globally is controlled by five companies, while open data is often the result of web scraping, which is noisy, repetitive, and increasingly fraught with legal risks.

This not only limits the effectiveness of AI models but also traps the entire industry in a "bad money drives out good" dilemma:

Open-source models must rely on low-quality training data; the issue with low-quality data is that its accuracy is difficult to verify, and it often contains biases. Expanding AI with low-quality datasets is nearly impossible.
Data producers lack incentives, further exacerbating data scarcity;
Frequent legal disputes pose significant risks for AI companies due to unclear copyright issues. These lawsuits arise because some of the training data used by large AI models is obtained without permission. Most people are even unaware that their data is being used to train AI. This contains immense value, yet it is extracted and monopolized by large tech companies.

On-Chain Storage Solutions: "Too Many Patches, Too Few Systems"

In the face of these issues, many solutions have attempted to address the deficiencies in data infrastructure. However, most remain "patchwork" short-term responses, lacking systemic, integrative, and sustainable approaches. For example:

Ethereum Blob Space (EIP-4844): Only provides 18 days of temporary storage, with capacity likely to be exhausted by 2025;
Celestia: Achieved a "data availability layer," but does not support the combination and long-term storage of structured data;
Filecoin: Slow data retrieval, non-permanent, and smart contracts cannot directly call its stored data;
Arweave: Highly volatile storage prices, weak performance and verification, with the computation layer AO relying on centralized bridging;
IP projects like Story Protocol: Focused on the on-chain management of IP assets but lack deep integration with data networks and do not support the construction of other applications;
Walrus: Deployed on other blockchains, high costs, limited functionality, non-permanent storage, and weak adaptability.

They each solve certain issues but fail to provide a structured, natively composable, and executable data infrastructure.

But Irys offers a different answer here:

Smarter incentive mechanisms: Balancing storage supply and miner rewards through controlled partitioning, ensuring stability and long-term healthy development.
More flexible storage methods: Supporting both short-term and permanent storage, with more flexible pay-as-you-go options.
Data-native smart contract capabilities: The only network that supports smart contracts to directly read and write on-chain data, with data and contracts natively integrated.
Fast and reliable data access: Providing instant access and high online availability guarantees.
Predictable low-cost pricing: Pricing anchored to hard drive costs, currently around $2.5/GB, with permanent storage expected to be as low as $0.03/GB.

A Network Born for Data and Execution Synergy

Irys is designed to address the fundamental issues mentioned above as the next-generation on-chain data network.

Integrated Architecture: From Storage to Execution, All Natively Integrated

On-chain permanent storage: Once data is uploaded, it can be retained long-term and can be retrieved, indexed, and reused;
Native access to data by smart contracts: No reliance on external bridging or oracles, contracts can directly read and write data;

The first project to natively build a low-cost path "from storage layer to execution layer" at the blockchain level. The execution layer is neither a layer two (L2) deployed on Irys nor an application running on Irys, but a core function directly built into the mainnet protocol. No other network has achieved this.

Optimized for AI and IP: Programmable data can carry authorization, transactions, and automation instructions; AI models can natively call training sets, and creators can embed copyright and revenue distribution logic; of course, there are more possibilities, and the limits of this all depend on the creativity of developers.
Ecological synergy: Database protocols can structure data, search protocols can provide indexing, monetization protocols can support data licensing and paid access— all tools operate based on the same foundational network, achieving combination and reuse among applications.

For example, essentially, protocols on Irys can share and discover data with each other, fostering an ecosystem where data becomes the fuel for data creators to realize more complex monetization methods. This mechanism provides a foundation for applications to leverage each other's data to create value, which ultimately feeds back into the entire main network, forming a positive cycle.

An Example: What Happens When Data Becomes Truly Usable?

A user uploads a set of data (e.g., original images or articles);
The database protocol structures this data into a format readable by AI;
An AI company obtains access permission through the monetization protocol and pays automatic revenue sharing;
The creator receives income in real-time, and the data is called multiple times, entering a larger market;
More developers build new services based on this, driving ecological expansion.

This process does not rely on centralized platforms or external systems; everything happens natively on Irys.

The Market Is Sending Signals: A Gap Has Emerged, and Demand Is Urgent

Technological trends can often be discerned from capital flows and resource bottlenecks. Today, the urgency of the "data" issue has been reflected in several market events:

Celestia raised $100 million: Data availability has become the biggest bottleneck for rollups;
Story Protocol raised $140 million: On-chain IP is imminent;
Ethereum storage costs remain as high as $900,000/GB: Unsustainable;
The demand for AI training sets has surged, with a persistent imbalance between supply and demand;
AI content infringement cases have increased by over 200% year-on-year: Lack of mechanisms to protect creator rights.

This indicates that a significant technological "gap" is opening—a trillion-dollar data infrastructure vacancy waiting for true problem solvers.

Web3's AWS: Building an On-Chain Data Flywheel

Looking back at the success of AWS, its key was not in single-point technological leadership but in unifying computing power, databases, and applications to create a positive flywheel.

Irys is replicating this logic on-chain:

User-uploaded data → Database protocol indexing → Authorization protocol monetization → Application invocation → Feedback to more data production → Network value enhancement → Attracting more developers to build
Each new protocol strengthens the role of the previous one, and each data call enhances the value of the entire network.

This is not just a tool for "storing data" on-chain, but a complete, scalable, and composable data operating system.

Data Is the Most Important Asset of the Future, and Infrastructure Will Determine Who Wins It

We are standing on the eve of a reconstruction of data infrastructure. On one hand, data is rapidly becoming the most critical asset driving AI, the content economy, and the smart contract ecosystem; on the other hand, the old systems can no longer support this change.

What Irys is building is not just a faster or cheaper data storage system, but a truly future-oriented infrastructure network:

Natively supports data storage and invocation;
Automated authorization and revenue distribution;
Supports various data needs for AI, IP, Web3 applications, etc.;
Easily accessible and reusable by developers, creators, and enterprises.

If AWS in the era of cloud computing captured the historical opportunity of "computing as a service," then Irys stands at the starting point of the massive shift of "data as a service." The door to the data economy has been opened, and the real question is—who will build the network that supports all of this? The answer is becoming clear.

Related tags

AWS AI