wildflow vision (part 2)

Created with DALL·E, an AI system by OpenAI

Since I left Google X over a year ago, I've been exploring different opportunities in the "ocean + biodiversity + AI/data/tech" space, building my own startup wildflow.ai. In the previous article, I explained our company vision in great detail, what problems we are solving and how (recap is below), and what drives me personally to do that. Huge thank you to all the wonderful people who reached out to me after the first post, that matters to me a lot, I’ll reply to everyone as quickly as possible! In this article, I’ll go through our strategy, explain high-level customer segments and what product we're building and offering. I try to be concise and to make sure the read is not too long, so I’ll publish our roadmap, case studies and risk mitigation ideas in the next posts. The doc is mainly for internal use, but I decided to publish it anyway. Hope it's gonna be helpful for someone. Also, the Internet is really good at telling you why you're an idiot, so I thought, why don’t I learn it faster 😀

Recap

Model biosphere. Our mission is to enable a comprehensive understanding of nature and align human activities with it, starting with oceans. We’re building a digital nervous system of the planet connecting sensors measuring the biosphere (eDNA, bioacoustics, computer vision, remote sensing, manual observations, etc) to actions (where to create marine protected areas, where to restore oyster reefs, where to install offshore wind farms, etc). At its core is a brain deeply understanding nature, how we impact nature, and how nature impacts us. It can forecast coral bleaching events, predict harmful algae blooms, estimate aquaculture crop yield, and more. It’s a collection of AI models, from simple statistical models to multi-modal foundation models for biodiversity, that we are pioneering. It comes as one cloud for biodiversity, an open data platform with open-source tools for data analysis and management, eliminating data wrestling for ecologists. By democratising access to AI models, biodiversity data and tools, we accelerate biodiversity research and conservation and align industrial activities with transparent mechanisms.

Customer Segments

We must establish a healthy balance between humans and nature.

⬆️[more protection] Accelerate research, conservation and restoration efforts.

⬇️[less destruction] Decrease the destruction we’re causing by helping businesses align their short-term goals with the long-term goals of humanity/planet.

If you want to be rich, you need to earn more and spend less (usually earning more is better; however, in nature, destructing less is better). To be rich, we need to cut the biggest expenses and focus on the biggest income streams. Similarly, we focus on the most impactful research/conservation projects and the most destructive industrial activities to align them. But in the beginning, we start somewhere in the middle, as it's hard to bridge this gap.

1. Research, conservation and restoration

Academia, conservation organisations, and ecosystem restoration efforts.

While doing the most protection, they drastically lack resources, are often understaffed, struggle with data wrestling and suffer from data fragmentation. They are enthusiasts passionate about saving our planet. They need help, but they can’t pay much.

They know state-of-the-art ways of protecting nature and understanding ecosystems. They also provide credibility. If they use our solutions, we’re on the right track. Example use-cases:

  • Creation and monitoring of marine protected areas (MPAs)
  • Coral bleaching events prediction
  • Harmful algae bloom forecasts
  • Early detection and management of invasive species
  • Oyster reef, seagrass meadows restoration

2. Businesses impacting nature

This is where money is. Also, they usually cause the most harm to nature due to their activity. These are often “box tickers” pushed by regulations to do something. They have a lot of resources but are often unsure how to apply them best (like what data to collect and where to collect it).

Examples: aquaculture, offshore wind, mining, shipping, pet food, marine biotech.

Why do both segments?

  • If we focus only on (1), we wouldn’t have a big enough market (at least now), which is OK; we could do a non-profit. However, industrial activities cause the most harm to nature, so to establish a balance, we must work on (2).
  • Focusing only on (2) makes it easy to be detached from reality and best conservation practices and slide into greenwashing.
  • We will build science-first, transparent, open-source and open-data solutions that work for both, conservation and businesses. It’s a longer path, but it’s the right thing to do. Instead of adapting to new regulations each time, we could drive them.

Ideal Customer Profile (ICP)

Here’s what we’re looking for when onboarding a new client.

  • Cares about protecting nature!
  • Needs:
    • Ecosystem modelling: forecasting, impact on the ecosystem, ecosystem services valuation, etc.
    • Making decisions and analytics against their data. E.g. tools to find where to restore habitats or install structures like wind farms.
    • Computing biodiversity metrics (e.g. credits, indexes).
    • Standardising and accelerating data workflows.
    • Storing or managing their data.
    • Transparent mechanisms, showing their impact and how ecosystem health changes.
    • Complying with regulations.
  • Already has a lot of data (needed for modelling).
  • Working with researchers (e.g. has in-house ecologists).
  • Ability to pay us.

If you match only a few conditions, that’s totally OK! Please feel free to reach out. For example, we don’t expect academia or NGOs to pay us anything, but there are a lot of interesting problems to solve there.

Product

One cloud for biodiversity, where ecologists (conservation, restoration or businesses) could quickly ingest their data, connect to existing data sources, run different analyses (like ecological niche modelling) and generate reports.

Data Layer

  • A scalable data lakehouse (data warehouse that can also work with raw data), where anyone could quickly ingest their data (video, audio, eDNA, time-series geospatial biodiversity data) to run analysis. Long-term sensors streaming data could be streaming directly.
  • Easily connect external data sources (e.g. chlorophyll levels from Copernicus, weather data from NOAA, satellite imagery from Earth Engine, observational data from GBIF, IBAT, etc).
  • Run arbitrary analytics queries to get results quickly. E.g. If you have shark detections from BRUVs, you have data from acoustic tags, and you would like to run ecological niche modelling against it, you can do all of that in the cloud in just a few minutes.
  • Data pipelines that can quickly merge your observational data with environmental data. You can also create your own data workflows.
  • You can publish/or quickly find data open data (e.g. under a Creative Commons licence) or sell/buy it in a private marketplace.
  • If your internet is slow, ship us your HDD drives to ingest and analyse.
  • Compute species distribution and other indexes (e.g. for biodiversity credits).
  • Collaboration tools. You can share your data with colleagues or see their data.
  • Visualisation tools. E.g. a heatmap of some species distribution.

We’re not reinventing the wheel here. We just take existing scalable cloud solutions (like Google BigQuery/GCS/Earth Engine, DataBricks or Snowflake) and build a thin layer on top of that tailored for biodiversity. Everyone is doing all that from scratch, creating unnecessary overheard for researchers. Our data layer is open-source and under MIT licence (anyone can copy-paste and use it as they want).

AI Layer

Here we have a collection of AI models, from open-source simple statistical models to foundation models for biodiversity. Anyone can upload models and collaborate on models. Anyone can run models against the data in the data layer. This is a thin layer on top of HuggingFace/other platforms. We’ll focus on gathering models in one place and making them easy to use. And, of course, developing our custom models for different applications.

Comprehensive modelling

Collection of models on how ecosystems function, how we impact them, and how they impact us. Including predictive modelling. Examples:

  • Impact of ecosystems on us (ecosystem services valuation). E.g. oyster reefs clean water, improve fishing, and take out nitrogen. We’ll have models to quantify all of that.
  • How we impact ecosystems. E.g. biodiversity assessment of how offshore wind farms impact sea bed biodiversity.
  • Ecosystem functions and how they work.
  • Ecosystem health and resilience modelling. Monitoring the health and sustainability of ecosystems, such as coral reefs, over time.
  • Population dynamics modeling. Examining how specific species populations (like oysters, algae, lionfish) change over time and under different conditions.
  • Habitat suitability and niche modelling. Determining the ideal conditions or locations for different species or ecosystems (where to restore coral or oyster reefs or place mangroves).
  • Event and phenomena modelling. Predicting or analysing ecological events like reef spawning, migration patterns, or seasonal changes.

We’ll have climate change impact, invasive species impact, trophic interaction and food web, landscape connectivity and many more types of modelling.

AI making things easier

  • Detect fish kind on video footage from BRUVs and estimate biomass.
  • Detect anomalies, clean data, and identify trends.
  • CustomGPT/AI assistant helping run on top of our data ware layer. E.g. you can just ask it to transform custom data and visualise things, using all our APIs.

Application Layer

Custom applications for your organisation are built on top of our platform.

  • Create specific reports and tools for your organisation.
  • Be a transparent 3rd party – e.g., offshore wind can give us data (e.g., many underwater videos). We can extract info from these videos, store it and make it all available to see how much they hurt the seabed. They are interested in this to win governmental grants/tenders.
  • Help to comply with regulations/frameworks CSRD, TNFD, SBTN, GRI, UK Biodiversity Net Gain, etc.
  • Compute biodiversity credits (using the different framework of your choice).
  • Help with data collection. E.g. where to collect data, in what way, etc.
  • Monitoring key indicator species estimating ecosystem health (example) once we have enough data.
  • E.g. you have a seaweed farm. We can help stream data to our platform, set up dashboards to monitor everything, estimate crop yield, and generate reports to show your environmental impact.

Strategy and Implementation

Focus on oceans

The same tech we build can scale beyond the ocean (e.g. forests, agriculture), and non-ocean people can still use our tools. But we focus on oceans for now:

  • Personal: love diving and admire oceans – main reason 😉
  • Importance: oceans are vast in scale (2/3 of the planet), 90% of habitable space, and problems there are less visible.
  • Less competition.
  • The data is usually less sensitive.

Multiple B2B verticals and horizontal B2C

  • Just go into verticals (e.g. oyster reef restoration, offshore wind) and build a solution for one company (almost like a consultancy providing services and tools).
    • Then, scale this solution to other companies and improve the service.
    • In parallel, put every generalisable building block into an open-source horizontal platform available to everyone.
  • The horizontal part will help all biodiversity researchers (e.g. NGOs who can’t pay for tailored solutions) and drive traction to our platform.

Modelling is a company priority

We have a lot on our plate. Understanding ecosystems (or modelling) on a deep level is one of the hardest things and unlocks many possibilities. We have expertise in modelling nature. This is our main focus for the company.

This presentation (UN ICP23) highlights that intermediaries (like wildflow) in the ocean observing and services market is the fastest-growing piece of the value chain. The intention to prioritise modelling, while hard and costly, is the right intuition long term.

Vehicle to unlock models

We will do everything else (data platform, applying models, guiding where and how to collect data) but on a level just enough to enable the best models possible. Driving modelling creates opportunities for others.

Multimodality

Single modality is limiting. E.g. using only video cameras is limiting because sometimes the water is murky and you don’t see far. Using only microphones is limiting because not all living things produce sounds. Using only eDNA is limiting because you don’t know how many animals were there (only relative abundance) and how long ago (could be a month ago). The key to great models is to utilise multi-modal data.

We’re technology sitting downstream from companies that use only one or a few sensing solutions. We get data from these organisations and then fuse it together to gain new insights that were impossible to obtain otherwise.

Fusing multi-modal data

How do we fuse multi-modal data? We’ll support both ways:

  1. Traditional way. We’ll support common ways to represent data about ecosystems. E.g. one way is time-series geospatial biodiversity data, where each data point stores location, time, which animal was there and how many (aka abundance or different metrics like presence/absence). Up the stream, we get this data from all the sensors. E.g. we extract information about animals and abundance from the video footage, and we know when it was taken and where. Down the stream, we run an analysis against this dataset. E.g. predicting the population dynamics of the species.
  2. End-to-end deep learning. We want to pioneer foundation models for biodiversity. The underlying technology is mostly already here (GPT, Gemini, and people building foundation models for geospatial data, physics, etc). These systems could “internalise” all the intermediate manual steps. They would utilise data way better than we could think of.

We need talent, compute and data to achieve (2). The biggest bottleneck is having a lot of multi-modal biodiversity data accessible for training models.

Getting the data

Here’s how we can access it (priority order right now):

  1. Partnerships – with organisations with a lot of data. Build tools or AI models for them in exchange for the ability to train models on their data.
  2. Open data platform – researchers and organisations upload their data (e.g. in exchange for AI insights or good tooling). We focus on the unification of all the siloed data. We’ll explore the most common incentives for sharing data, like being a transparent 3rd party for offshore wind.
  3. Data marketplace – allows people to buy and sell their biodiversity data; it could bring a better resolution, freshness and utility than open data approach.
  4. Buy data if you can.
  5. Measure it ourselves – overhead in managing your hardware, and can’t measure the past.

Also, we can focus on:

  • Narrow (top-down) approach: acquire specific multi-modal data for a particular goal (e.g., creating a marine protected area or predicting coral bleaching).
  • Broad (also bottom-up) approach - get whatever biodiversity data we can find and then see what insides can be derived from it (once you surpass some critical point, exciting things will be unlocked from that data).

In the short term, we will follow a top-down approach with every client, helping them with their data. But we’ll also be open for a broad approach (with an open data platform, and potentially a marketplace).

Multiple ecosystems

One could say we’re spreading too thin by simultaneously looking at too many verticals. We should focus on one niche, e.g. offshore wind farms, excel at that, and expand later. And it’s better to create a fantastic product for a small niche than a terrible product for multiple.

That’s right! We’re fully on board with this thinking. Just the niche we’re doing is different and goes through multiple industries. Our niche is foundation models for biodiversity. And at the beginning, even more specific niches like population dynamics or ecological niche modelling.

There’s also an advantage in modelling multiple ecosystems at once. We’ve seen multiple examples when, e.g. language models become better at poetry when trained to generate music as they get a sense of a rhythm. A fine-tuned acoustics model initially trained to recognise birds performs better on coral reefs than if you trained it on coral reefs alone. So, it might be beneficial if you create one model for forests and coral reefs simultaneously. We’ll be mindful of our resources to stay focused. However, it’s helpful to work on a few ecosystems at once.

Being a glue

Big forces: NOAA, Google/Amazon, BP/Shell, UN, DARPA, academic institutions, etc. They can and can’t do certain things. We're bridging the gaps, kind of a glue between them.

Example: NOAA (and other gov agencies) talked at the OCEANS conference last year about how they are actively seeking solutions that essentially add value to their efforts as producers of ocean information, as they need to justify their existence and their budgets, and while they provide a lot of foundational information, they can't effectively serve the long tail of use-cases.

Finally

  • I’ll publish our roadmap, case studies and risk mitigation ideas in the next posts. Stay tuned!
  • If this resonates with you or you have any ideas/suggestions, please let me know!
  • Huge thank you for a lot of friends who helped reviewing the doc! 🙏

Asks

  • We looking for passionate people around biodiversity tech who could benefit from our solutions (clients and partners), planning to expand the team and looking for funding (mainly grants) to bootstrap the business.
  • If you know any grants to apply for, that would be super helpful, as funding is the biggest problem right now.
  • Give us a star on Github. Follow us on LinkedIn.
    Join our Discord and WhatsApp communities if you're interested.

Please help me improve my writing 🙏