Bruno Sánchez-Andrade Nuño

January 30, 2024

Why AI for Earth is Different

AI is changing the world of text, images, audio, ... but not Earth data. Yes, its hard to work with, but not only we are dropping the [globe] ball, AI for Earth has outsized benefits (impact and profits alike), specially if done fully in the open. 

We have amazing breakthroughs in AI with text, images, video, and audio — but not Earth data. This is deeply disappointing considering the massive global challenges we face related to nature, climate change, and sustainability. I think part of the reason for this gap in AI is that AI and geospatial skills are the bottleneck. Earth data is very difficult to store, process, and work with. So AI+Geospatial is an extremely niche set of skills. 

But here is why I am extremely bullish on using AI for Earth: Once AI is harnessed to decipher Earth data, it reveals a mindblowing attribute, one that basically means this is not the open-ended quest for ever larger models to chase to infinity. This race has the finish line clearly in sight, and the data we need is — and will remain — completely free. 

So what is that unique attribute? There is only one Earth. Large, wonderfully diverse and constantly evolving, but just one. 

I can pick any location on Earth and time, and can guess with fair certainty its possible past and future. Yes, I might need rather complex climate, biodiversity and geology expertise, but my answer will be much more accurate than predicting 100 pages forward for one sentence on a random book within the vast library of all possible books past, present or future. Predicting the next words is the essence of how most AI are trained. On this one Earth, Madrid will not pop up in the ocean, nor will lush forest grow by tomorrow in Houston. Forest fires don’t create mountains, and the coast is always between water and land.

Even climate change doesn’t break this realization. It is true that the past climate is not a good predictor of future climate, but it is also true that the pathways of change not only are few and roughly known, but most of the changes of the future effects of climate are current realities of another place. We can predict that San Francisco in 2050 might be like Lisbon today. We don’t say that San Francisco in 2050 will be purple fluffy blobs of hair. The reality of Earth, past, present and future, is an extremely small space of all mathematical possibilities. This principle mirrors the core insights powering breakthrough AI tools like stable diffusion, where AI is trained to navigate and prioritize certain realms of mathematical possibilities. But unlike text and images, the possibilities of Earth are much more restricted.


The Nature of Nature
To quote Ralph Waldo Emerson, “Nature is an endless combination and repetition of a very few laws”. The more you travel nature, the more you can understand new and old places.

I can see how the first AI models of Earth will be bad, clumsy and naive, but I cannot see how progress will not quickly accelerate and then, crucially, flatten as the AI finds less surprises when we train on a representative enough sample of Earth. There will always be room to improve, but we will only see giant improvements on the early models. 

It reminds me of Google Maps. Before Google Maps, most of us thought of Earth as an unmeasurable infinite canvas. Once it came out, we realized that it’s limited. Large, but limited. The first versions of Google Maps were outdated, incomplete and too basic. But as much as they might be spending on improving Google Maps today, it will never be an improvement in value as its first year or so when it came alive. 

For AI for Earth, I do not know how much data is enough data to reach the tipping point. It is certainly more than the whole Earth once, but it is also certain that changes over time can be largely informed by changes in space. Earth is highly redundant in space and time.

Do we have the data? Oh yes. Every day since the 70s we have collected, and continue to collect, a complete survey of the entire Earth every few days, and all that data is readily available in piles and piles of complex pixelated files thanks to programs like Landsat from NASA/USGS and Sentinel from ESA. We have Petabytes of Earth data, orders of magnitude more than the Library of Congress has text, or what was fed to train ChatGPT. Moreover, when higher resolution or cadence is required, commercial entities step in, offering vast datasets and creating rightfully lucrative spaces for innovation and profit.


Building in the open
Building in the open serves as the lynchpin, not just orchestrating but also catalyzing the momentum towards a comprehensive understanding of Earth. Open-source AI applied on open Earth data is the breakthrough to convert piles of data into searchable concepts. The ultimate librarian of our Library of Alexandria who has read all the books. This is what will enable anyone anywhere to know about forests, fires, deforestation, and paved roads as easily as they Google or chat with ChatGPT. They will be able to own the code, the model, to tweak it for their communities and interest, to test it, to improve it, to sell it, to merge it with their data or other commercial data.

Building in the open is not the counterculture punk alternative to commercial. This is not Linux versus Windows in the 90s. Building open-source AI for Earth is the mechanism to maximize impact, enable profitable services, and ensure a thriving ecosystem. It removes the high barriers of entry of what it takes, in skills and budget, to understand Earth. It lifts the baseline expectations of what is free, open and available. It also sets up much easier and cheaper on-ramps to go much further building commercial services on top.

For commercial companies, embracing open-source AI for Earth becomes an evident strategic choice, allowing commercial players to leverage the vast corpus of open Earth data without the redundancy of deciphering the common data pool from scratch. It is fiduciarily obvious that any commercial player will not leverage the large corpus of open Earth data, and it is also incoherent that each of them would need to repeat the very same process of understanding what that common pool of data says, only then to start adding the details customers are ready to pay for. 

This is why we started Clay as a nonprofit project. We are now a growing team and community, having raised $4M and just released v0. We are the change, but not because we are original. We join a growing ecosystem of passionate people trying different ways to get to our shared destination: understand Earth. We have designed Clay not to compete, but to support all these efforts: We take the harder path of raising nonprofit funds to orchestrate existing open source, add the missing parts, pull all those Petabytes of data, train those models, and take them to the frontline. Test them, improve them, and give them all for free to nonprofits and for-profits alike. It is the only way to truly harness the promise of AI for Earth. We take all that raw data, all that hard code, and we make Clay, so that anyone can build with it.

LinkedIn

Screenshot 2024-01-29 at 22.16.29.png


About Bruno Sánchez-Andrade Nuño

Scientist. Impact Architect. Intellectually promiscuous. Stoic optimist… all that you need when working on tech innovation for climate change, socioeconomic development and biodiversity. By training PhD Astrophysics and rocket scientist. By way of #PlanetaryComputer 
Saepe cadendo. Dad to Sela, @emmyagsmith husband