wildflow vision (part 1)

Created with DALL·E, an AI system by OpenAI

Since I left Google X over a year ago, I've been exploring different opportunities in the "ocean + biodiversity + AI/data/tech" space building my own startup wildflow.ai. It's been an incredible journey! Learning so much from amazing people around the world, from OceanMBA at MIT to spending over a month working with a shark conservation project Pelagios Kakunja in Mexico. My head and Notion are exploding from all I learned, holding all the landscape of opportunities. Finally, my vision has crystalised – building a digital nervous system for the planet, focusing on foundation models for biodiversity. Many people ask me what I do for work, so I decided to write a small one-pager explaining it. I was trying to be extremely concise, vigorously deleting less important things. 25 pages later it's done! This doc is intended for internal use, I still need to extract one-pager out of it, but I decided to publish the doc anyway. Here's the first few pages below. Also see part 2.

Let's go!

We can’t live without nature

Natural ecosystem services provide benefits of $125-140T per year (global GDP is ~$100T). The oceans represent over 99% of the living space on Earth (NASA, UNESCO). They are vast! It’s unclear how many species live in the oceans exactly (NOAA), but we know it’s at least 245k and more to be discovered. Oceans hold 78% of animal biomass (and only 1% of total biomass due to plants and fungi). They are vital for the planet's health, with an asset value exceeding $24 trillion (WWF, BCG) through fisheries, seagrass, mangroves, transportation, coastlines, and CO2 removal capabilities. Oceans produce at least 50% of all oxygen (WHOI, UN), capture 31% of all CO2 (NOAA study), and are a vital source of protein for over a billion people (UN).

The ocean is dying

We lost 69% of living things on the planet over the past 50 years. We kill ~100M sharks per year. Over 300k whales and dolphins die just from entanglement in fishing nets each year (WWF). The Great Pacific Garbage Patch full of plastic is 1.6M km^2 (more than France + Italy + Spain combined). Bottom trawlers wipe out 8M km^2 per year (ScienceDaily, SeaSpiracy), comparable to the size of Brazil. What we do is devastating!

While technological advancements and economic growth are essential and invaluable, we must align our goals and stop planetary-level self-destructive behaviour to survive and thrive. To establish a healthy balance between humans and nature, we must conserve/restore more and destroy less.

Our limited understanding of the incredibly complex natural world is one of the biggest challenges. We need to deeply understand how nature functions, how we impact nature, and how nature impacts us to align our goals properly. We must increase our communication bandwidth with nature and give nature/ocean ecosystems a voice. And we need to do this faster.

Sci-fi Vision

Imagine all the sensing solutions we have in the world measuring nature (like underwater cameras, bioacoustics devices, eDNA, remote sensing, etc). The data from these sensors is analysed and understood (aka modelled), e.g. someone runs an ecological niche modelling, forecasts harmful algae bloom, etc. Then, based on this information, someone acts in the real world (e.g. creates a marine protected area, installs an offshore wind farm, restores coral reef, etc).

You can imagine these pathways of how the data flows from e.g. a video camera of some BRUVs technician into a CSV of a data analyst, some other system, and a PDF report in the hands of a policy maker, etc.

Each pathway usually follows measure -> model -> act steps known as a cybernetic loop. E.g. if you touch a hot pan and instantly snatch your hand away, that’s you following precisely the same cybernetic loop steps. The difference is that for you, it takes milliseconds, but our environmental response takes years.

All these pathways/cybernetic loops form a massive digital nervous system of our planet, connecting sensing solutions to real-world actions, where the modelling part is responsible for making decisions. We need to accelerate these feedback loops and strengthen this digital nervous system. If terrible things happen to turtles, we react straightaway, not in 5 years!

Imagine we (humanity) accelerated research to the point when the modelling part of the digital nervous system became a mega AI brain (which could just be a distributed collection of models). This brain comprehensively understands complex ecosystems with all their intricate dynamics and interactions happening under the hood. It can model population and predator-prey dynamics and predict phenological events like coral spawning. It can provide a detailed analysis of all ecosystem services and understand ecosystem functions. It quantifies the ecosystem's health and can tell us where it “hurts” the most, which activities cause the most “pain”, and suggests small changes that strengthen ecosystems and have a profound positive impact, growing our shared pie of global economy.

This system is integrated everywhere we interact with nature, from conservation organisations and academic institutions to industries and policymakers. It’s primarily based on open-source and open-data approaches, providing transparent mechanisms showing who and how affects nature.

Mission

Our mission is to enable a comprehensive understanding of nature and align human activities with it, starting with oceans.

Impact

Our core focus is pioneering large AI models tailored for biodiversity. Like foundation models (FM) for biodiversity. See FM for ecology and MIT course on FM). There’s no single “biodiversity modelling” or “data tools for biodiversity” market to target, so we need to create it. However, almost everything is down the stream of nature! Moving towards our vision will profoundly impact many things. See examples below. Note: not all of them are the focus of wildflow.

  • conservation: creation and monitoring of marine protected areas (MPAs) (working prototype), forecast coral bleaching events (exploring)
  • restoration: oyster reefs, seagrass meadows, mangroves (strong leads)
  • aquaculture: optimal farm placement, growth monitoring, and yield optimisation for industries like seaweed farming (exploring)
  • invasive species: early detection and management strategies (previous job experience)
  • offshore energy: biodiversity assessments for offshore wind farms (lead)
  • mining: environmental impact mitigation in e.g. lithium extraction (lead)
  • fisheries: sustainable fish stocks management, reducing bycatch (todo)
  • fishery byproducts: in sustainable sourcing and traceability for industries like pet food manufacturing (exploring)
  • biodiversity credits and monitoring, reporting and verification (MRV) (leads)
  • climate change: carbon sequestration assessment and monitoring of blue carbon ecosystems (todo)
  • biotech/pharmaceuticals: discovering and monitoring marine genetic resources (todo)
  • shipping: environmental impact minimisation, port sustainability (todo)
  • governments: policy-making, environmental regulation compliance, and sustainable resource management (todo)

We won’t have to do all the applications ourselves! Similar to how everyone is now building things on top of ChatGPT (or integrating OpenAI’s APIs), our models will create enormous opportunities.

Multiple businesses will integrate our models and tools, making industries like energy extraction more sustainable (improving the “act” part of cybernetic loops).

Similarly, businesses specialising in sensing solutions would have more users (improving the “measure” part of cybernetic loops). This will empower local tech providers (e.g. Gérard Zinzindohoué) or drone freelancers as they would have a place to sell their data.

Why now?

The crisis is becoming bigger and bigger, also:

  • Significant breakthroughs in AI research and maturity of data infra.
  • More demands from the public and more policies appear. E.g. UN High Seas Treaty was signed (protecting 30% of the oceans by 2030).

Path there

On a path of accelerating towards a comprehensive understanding of nature and aligning human activities with it, we have these obstacles:

  1. Limited understanding of nature itself (top priority) – We will pioneer foundation models for biodiversity to help us understand nature better (and utilise data better). We’ll start collecting simple statistical models in one place and making them accessible. We don’t want to be a bottleneck for biodiversity researchers and tell them what to do. Instead, we will empower them with cutting-edge AI and data tools.
  2. Slow response and research – we will be accelerating research and response to environmental threats by optimising existing processes and doing heavy data lifting for ecologists (so that we create marine protected areas in months, not in 5 years, etc). Building custom solutions for businesses and optimising their data flow is our beachhead market (initial source of revenue before we get to comprehensive models).
  3. Aligning human activities – if you have a mega AI model that understands nature deeply, but not using it – it’s a problem. We’ll be actively engaged with businesses operationalising our models and tools. Always focusing on transparency!
  4. Data fragmentation and accessibility – hindering 1-3 above. We need to work on it to unlock the best models.
  5. Data wrestling for ecologists – hindering 1-3 above. This is aligned with our mission and is our initial market (doing heavy data lifting for ecologists).

To drive our mission forward:

  • We focus on foundation models for biodiversity. We bring existing cutting-edge AI and data tools and apply them to biodiversity, building a digital nervous system for the planet with the AI brain that deeply understands nature (our North Star).
  • To do that, we need a lot of biodiversity data. We will focus on unifying biodiversity data (which is currently fragmented and not easily accessible) through partnerships, providing B2B services (exchanging data/money for services) and, in parallel, building a horizontal open-data platform and a marketplace for biodiversity data (targeting different user incentives). If someone has a lot of data, we’d happily join forces and focus on modelling (our main objective). See the “getting the data” chapter for more info.
  • Building a horizontal data platform and gathering data takes time and effort. A marketplace alone might not take off quickly enough and be profitable. Businesses in our verticals might not need simple AI models (it takes data to get to comprehensive models). We need a parallel source of revenue.
  • That’s OK! In this case, we are falling back on solving business needs (for each vertical) around data management and optimising processes by building tools and services (see more in the “Product” chapter). This aligns with our mission and contributes towards solving multiple key problems. Also, we will be building tools and infra for training models anyway. We just need to keep in mind our “North Star” and not over-fit into the tooling only.
  • All the generalisable building blocks (tools and models) for biodiversity (from what we do for businesses) will be open-sourced and made available in our horizontal platform, creating more traction. This data platform (or one cloud for biodiversity) will empower biodiversity researchers with transparent, open-source tools and models.

It’s very challenging to pull this off. This seems to be the best shot at how we mitigate risks and create a sustainable business model that helps biodiversity researchers from academia and small NGOs to large corporations protect our planet better!

Problems

Limited understanding

The primary focus of wildflow is accelerating biodiversity research towards a comprehensive understanding of nature (aka ecosystem intelligence augmentation). A lack of in-depth knowledge of nature’s complex ecosystem dynamics leads to poor decision-making.

  • Ecosystem services valuation: challenging to quantify the benefits that ocean ecosystems provide, like water purification, climate regulation, food production, and coastal protection.
  • Environmental impact assessments of industrial activities (like offshore energy extraction) are challenging.

Here’s a successful inspiring example of understanding nature well:

Wolves in Yellowstone: In the 1930s, Yellowstone lost its wolves, throwing the ecosystem out of balance. In 1995, scientists reintroduced them, sparking a remarkable chain reaction. By hunting elk, the wolves reduced overgrazing, allowing aspen forests, willow stands, and other vegetation to flourish. This created new microclimates, cooler and shadier, where streams began to flow again. Thriving in this rejuvenated environment, beavers built dams, reviving rivers that had been dry for several decades. The ripple effects extended beyond the waterways, creating new habitats and benefiting over 70 species, including bears and songbirds. This swift transformation is a powerful example of predator-prey dynamics and the potential for ecosystem restoration (more in Decade of the Wolf).

Similar inspiring stories about otters and urchins in Alaska, whales and krill, oyster populations, coral reefs, seagrass and mangroves exist. What if we (as humanity) could scale similar approaches 100x? At the moment, things could be faster and more efficient.

Slow response

Ecosystem monitoring, protection of endangered species and their habitats, early detection and management of invasive species (lionfish), restoration efforts (seagrass, oyster reefs), and response to environmental threats (like harmful algae blooms) are not efficient enough at the moment.

Example: I’ve spent over a month in Mexico working with Pelagios Kakunja - a shark conservation NGO. They protect whale sharks, hammerhead sharks and dozens of other species. They are amazing! They use multiple sensing solutions (acoustic tags, satellite tags, stereo BRUVs, remote sensing and more) to conduct research and protect these species by creating MPAs (marine protected areas).

They told me it takes at least five years to create one MPA, they have all the data collected but, just the data analysis itself takes at least one year. Yet you need to iterate with the government very quickly and also climate changes (El Nino) so you need to create MPA in a different place. Could the data analysis part be done in a month instead of years? Or in one day? Or in milliseconds? (what law of physics forbids it to be faster?)

We must accelerate biodiversity research to speed up conservation and restoration efforts and help guide industrial activities.

The next two problems are critical to why we have limited understanding and slow response.

Data fragmentation and accessibility

Crucial information about the ocean is scattered and siloed across different organisations. It’s not easily accessible and usable (different formats and systems). Businesses often collect data that remain underutilised or inaccessible to others who could benefit from it. This problem hinders research, conservation, and informed decision-making for industrial activities.

“Petabytes of ocean data are under the control of government agencies, researchers and private companies, such as those in oil and shipping. This information must be made available – fast – to enable sustainable management of marine resources.” (nature)

Example: a PhD researcher from the University of Ghent investigating the impact of offshore wind farms on the surrounding ecosystem. In particular, studying how the introduced artificial structures (wind turbines) are rapidly colonised by biofouling organisms, attracting higher trophic levels, and leading to the formation of the artificial reef effect. It took many months to access necessary observational data from three different organisations (commercial and academic). The next big challenge was to clean this data and transform it into one format appropriate for training ML models and analysis. Only then could they focus on doing the actual research.

It’s not that many conceptually different data types in biodiversity. Time-series geospatial biodiversity data (when, where, who, etc) is one of the most common. Imagine there’s a place with standardised data where you could train your model straightaway, not against just three datasets obtained over months, but against 100s and instantly.

Data wrestling for ecologists

If you’re training an ML model, usually 80% of the time is spent on preparing the data. In AI research it’s done by dedicated engineers. In today's landscape, biodiversity researchers (wildlife biologists, data analysts, etc) must not only excel in understanding nature and addressing the biodiversity crisis but also master big data management. This includes building data pipelines, aggregating data from various sources, cleaning data, handling extensive big data operations and deploying novel models for multi-modal data formats.

“We currently spend hours if not weeks in front of the screen trying to figure out what’s going on with the data, which translates to thousands and thousands of dollars” (Sophie Locke, lead researcher at Blue Marine Foundation)

Example: a biodiversity researcher in Mexico (proficient in R and marine biology expert) working for an NGO wants to run a simple ecological niche analysis (what conditions are best for sharks) against 30MB of their new shark observational data. They need to sample data from 100GB of environmental data (water temperature, salinity, chlorophyll levels, etc) from ESA Copernicus. Copernicus doesn’t allow downloading more than 2GB at once. You must select a bounding box and other parameters each time manually (unless you are not proficient with scripting languages). Even if you could download a 100GB dataset straightaway, with their 52 Mbps internet, it takes almost 4.5 hours. Once you download it, there are problems fitting this 100GB dataset into RAM (they have 16GB) when using R or even Python Pandas. It takes many days to perform this simple analysis.

We downloaded the dataset from Copernicus into Google Cloud (simple Python script) in a matter of minutes (limited by Copernicus’s free quota egress). Then ingested this into Google BigQuery and performed a simple join against their shark observations. This took us 40 seconds and returned a 50MB file with shark observations and environmental conditions needed for ecological niche modelling.

What if they had cloud-based tools tailored for biodiversity to manage big data and run different analyses (like it’s already being done in BigTech, AI research, and Finance for a while)?

This applies to both: ecologists working in conservation and ecologists working for businesses running industrial activities.

Business goal alignment problem

Businesses must align their short-term goals with the long-term health of nature and humanity. All of the above challenges apply to businesses.

In addition, businesses lack transparent mechanisms for monitoring, reporting, verifying biodiversity metrics, and evaluating and communicating biodiversity impact. E.g. it’s often unclear where to collect new data, what type of data is needed, where they should publish this data, and who could process it further and tell if they are doing the right thing.

Luckily we see regulatory frameworks like:

  • CSRD: Corporate Sustainability Reporting Directive,
  • TNFD: Taskforce on Nature-related Financial Disclosure,
  • GRI: Global Reporting Initiative,
  • SBTN: Science-Based Targets Network,
  • UK Biodiversity Net Gain

starting to appear. Someone needs to help guide businesses.

Solution

The solution would be to give ocean ecosystems a voice by strengthening the planetary digital nervous system and brain. We should broaden slow measure -> model -> act pathways making them fast highways so that we can understand, protect and restore nature faster.

1. Enable comprehensive modelling (top priority)

Enable comprehensive modelling of nature. Leverage ML and cutting-edge AI (e.g. foundation models for biodiversity) to extract actionable insights for an in-depth understanding of ecosystems, better data utilisation and more informed decision-making. This would change everything if one could model various “what if” scenarios regarding ocean biodiversity dynamics and human interventions!

2. Operationalise models into businesses

We will create models to solve specific business use cases and ensure they are properly integrated. Our primary revenue source will be from B2B business-specific models and analytics tools. This will help us build better models and have the most impact.

We don’t want to end up in a situation where a great AI-driven tool for detecting coral bleaching events exists (or modelling ecosystem survises and impacts), but its key user base isn't aware of it, can't use it, or doesn't have the time/money to use it.

However, our core focus is on comprehensive modelling itself. This will create great opportunities for others to create businesses around operationalising our models (like Midjourney uses Stability AI's Stable Diffusion).

3. Organise biodiversity data

The biggest bottleneck for us in training new models - is to have a lot of biodiversity data in one place. We need one centralised place for scattered multi-modal (tabular, video, audio…) biodiversity data, where it’s standardised and easily accessible (data platform). This solves the data fragmentation problem. Creating a data platform will accelerate research, conservation and decision-making on its own (as one can quickly access any data), and also will enable an in-depth understanding of nature, as we can utilise data better for analysis (as one can learn from 100s of datasets, not just 3).

We’ll follow these two callings: “First: federated data networks to connect disparate ocean databases. Second: new incentives and business models for data sharing. These can create an open, actionable and equitable digital ecosystem for the sustainable future ocean” (nature)

We’ll try covering all major incentives for data sharing. E.g. offshore wind farms are ready to provide their data to independent 3rd parties for transparency, which is a competitive advantage for them to win gov contracts. Academic researchers often have to publish data when publishing a research paper anyway, and they want their datasets to have more citations. NGOs sometimes buy data from neighbour NGOs (i.e. a marketplace could exist). Also, many organisations are ready to share their data in exchange for analytics and modelling. See the “getting the data” chapter below.

4. Provide efficient tools

A way to eliminate data wrestling for ecologists is to provide them with cloud-based tools and infrastructure. They need convenient tools to manage, analyse and visualise the data, and also create reports (business intelligence). Example tool: combine observational and environmental data (everyone needs it). Also, ecologists need collaboration tools to share data and insights between people easily. We would build tools and infrastructure to train and deploy our models anyway. We will make them open-source and available for everyone. This will help ecologists, drive traction to our data platform/models, and accelerate overall progress.

Finally

  • In the next posts (post 2) I'll explain our customer segments, ideal customer profile, describe the product, our strategy and roadmap, case studies and more. Stay tuned!
  • Huge thank you for a lot of friends who helped reviewing the doc! 🙏

Personal motivation

  • We destroying nature at unprecedented speed to the point where it becomes an existential crisis for human civilisation. It’s an urgent and important problem to work, which matters to me.
  • However, it's a sad reason above. It's hard to only think about problems in life. I'd like to have reasons that excite me waking up every morning and following my dreams! These exciting reasons are below:
  • Nature is incredibly complex and has evolved for hundreds of millions of years. It’s a beautiful complexity where everything is how it is for a reason. Nature inspired so many technologies (see biomimicry), from aeroplanes to modern medicines. There are so many more secrets we don’t know and wouldn’t know if we keep destroying nature.
  • If we understand ecosystems deeply, we could use this technology beyond conservation. In the future, we could make a jungle from a desert and terraform Mars (or Venus). Along the way, we’ll encounter so many interesting unknown unknowns! Engineering all of these things is super exciting!
  • “Life is the universe developing a memory” (@leecronin). We need to help the universe. Humans and other living things want to live and expand. We should give nature a chance to live. We should understand the secrets of the universe together.
  • Lastly, I love adventure, I love to stay as close to reality as possible. Stay present and grounded. Imagine how much fun is diving with penguins under ice or humpback whales in Tonga, and with a great purpose. And also meet unique, kind and passionate people along the way!

Asks

  • Now that you know what I do, how would you pitch this to others? (elevator pitch) I still can't wrap my head around how to deliver "wolves in Yellowstone" feeling, explain foundation models, cybernetic loops, digital nervous system and all the steps before that, show how we can accelerate biodiversity research and align human activities using cutting-edge tech, just in a few sentences.
  • We looking for passionate people around biodiversity tech who could benefit from our solutions (clients), planning to expand the team and looking for funding (mainly grants) to bootstrap the business.
  • If this resonates with you or you have any ideas/suggestions, please let me know!
  • Give us a star on Github. Follow us on LinkedIn.
    Be the first to join our Discord and WhatsApp groups.

Please help me improve my writing 🙏