arrow_down arrow2 back chevron_down chevron_right cross Facebook linkedin menu minus plus search social toggle_sort
Case Study

Inari unlocks the full potential of seeds with AI-powered platform

Category
Data & AI
Share

SEEDesign company Inari Agriculture chose Crayon to design and build a Data and Machine Learning platform that imposes end-to-end consistency on its genome experiments and data governance. Researchers as different as maize geneticists, ML engineers, plant breeders, and data scientists can have their preconfigured and governed environment ready in a matter of minutes. The platform is speeding up ML projects, enabling better science through shared notebooks of reporting and analysis, and bearing down on the cost of compute.  

Challenges

  • Problems of data lineage and data transformation were hampering research
  • Machine Learning experiments were not run in a systematic way
  • Experiments lacked operational visibility, making it difficult for scientists to share or compare results
  • Fragmented ML methodologies increased the cost of compute, and slowed down results
Project Summary

  • Inari chose Crayon as its partner to build the Data and ML platform on AWS
  • Crayon structured how the data was stored, as well as access and permissions
  • Crayon set up data pipelines, implementing desired data transformation logic
  • Implementation of Airflow system to orchestrate data and ML workflows
  • Three Inari projects were run through the new data and ML platform
  • Crayon produced manuals to enable Inari scientists to onboard their experiments independently.
Business Benefits

  • ML experiments run up to 10 times quicker – from 24 hours to 20 minutes in the case of one analysis
  • Internal Inari projects can be provided with governed and isolated environment in a matter of minutes
  • Results can be benchmarked more easily because the platform has imposed operational consistency
  • Results can be validated more quickly for intellectual property filings
  • Cost of compute is down significantly because the Crayon platform auto-scales

 

A game-changer for agricultural sustainability

Inari Agriculture is one of the world’s most exciting and promising companies in agricultural tech. It was recently named 2023 Overall AgTech Company of the Year by the AgTech Breakthrough Awards.

Inari’s overarching aim is to enable growers of the world’s largest crop types to withstand the shocks of a changing climate and help transform the sustainability of the global food system.

Combining AI-powered predictive design with multiplex gene editing, the company’s goal is to boost yields of wheat, soy, and corn by as much as 20% while reducing nitrogen fertilizer and water use in corn by 40% – in other words, producing more while using significantly less land, water, and nitrogen. These targets illustrate the scale of Inari’s ambition – and the magnitude of its task.

Why Crayon?

Inari talked to a number of AI services providers but opted for Crayon because of its unparalleled expertise and immediate grasp of the challenges ahead. Project management of this complex build was also exemplary as Crayon built trust by breaking down technical decisions into detailed options and recommendations. The process from initial conversation to a platform in production took no more than nine months. Having ‘road-tested’ the platform by running three Inari projects, Crayon prepared an extensive manual to allow scientist to self-start and steer their projects, something they had not been able to do before.

Overcoming challenges: Unifying machine learning and taming complex data

Inari uses Machine Learning to “experiment” on plants through the company’s proprietary SEEDesign™ technology platform.

“But here is the first difficulty,” says Alex Frieden, Director of Engineering at Inari. “We are a very diverse organization with software engineers, machine learning scientists, maize geneticists, plant breeders, and many other experts in their field all using data science and ML for different ends. One of the challenges we faced was a lack of central knowledge or a shared and consistent way of doing our ML experiments.”

The second and related challenge was the data: its lineage, complexity, formatting – and sheer volume. Plant genetics are astoundingly complex. For example, the wheat genome alone is five times the size of the human genome.

“The roles were fragmented, but so was the data,” says Frieden. “Everyone had their own way of setting up experiments using different tools and data sets. Data came from many places with wildly different information that we had around it– our own research, university labs, public data, bought data, and so on. We used a tool called Quilt to track data lineage but it wasn’t used systematically.

“In short, we needed to impose a more uniform and automated way of doing our data science and ML research. The way we were conducting it was time-consuming and the inefficiency of the compute drove up costs. But, most importantly, it was beginning to stand in the way of the science, because the experiments were not repeatable, and lacked operational visibility, which meant results were difficult to compare.”

Crayon partnership: Accelerating research, improving science, and fostering innovation

Inari talked to several potential partners but opted for Crayon. “We liked our conversations with the Crayon AI team. They really convinced us they had the deepest knowledge,” says Frieden. “They got it. They got us.”

The task? To design a platform that would establish end-to-end governance of data sets, impose unified workflow tooling across Inari, and democratize Machine Learning.

“From our earliest conversations to an ML platform in production took no more than nine months.”

Inari’s only non-negotiable was that the platform be built on AWS. This meant that fine-grained permissions across AWS are set and managed by AWS Identity and Access Management (IAM). The inclusion of autoscaling cluster technologies like Elastic Kubernetes Service and Databricks platform, represents a cost-effective "pay-as-you-go" pricing model.

“But apart from the wish to use AWS, everything else was on the table,” says Frieden.

We liked our conversations with the Crayon AI team. They really convinced us they had the deepest knowledge.They got it. They got us.”

Alex Frieden, Director of Engineering, Inari

“The way Crayon structured its proposals and recommendations helped build trust,” he adds. “They didn’t just say: ‘Here is our recommendation’, but would lay out options, pros and cons, and then give their advice. For example, for the Machine Learning functionality Crayon listed three options: MLFlow on Databricks, SageMaker – which is the Amazon ML platform – and Kubeflow.

“Despite being potentially the most expensive option, the overriding advantage of MLFlow on Databricks is that it lets you have a single unified stack for ML and data processing. After many detailed discussions informed by the information Crayon provided, that’s what we selected.”

We do not have the scope here to analyze the platform in all its complexity but will limit ourselves to a high-level discussion of the benefits.

Data from public data sets, Inari phenotyping, or other labs as well as the associated metadata were structured in different buckets prior to “first pass” primary analysis in preparation for ML computation. “Here we are essentially transforming raw DNA sequencing data into the other types of files that can be used for Machine Learning,” says Frieden. “This is part of the process where a lot of data scientists needed the support of the engineering team.

With the Crayon platform they can now initiate and self-steer a project, especially as the secondary and tertiary analysis has been automated.”

At the core of the platform architecture is Databricks alongside tools such as such as Quilt for metadata and lineage, Airflow for workflow orchestration, and MLFlow for MLOps. “Databricks brings a lot of important benefits,” explains Frieden. “It is a scalable platform that empowers best practices yielding a massive reduction of workflow execution times.”

Other benefits of Databricks flagged up by Frieden are:

  • It provides a distributed cluster environment that is more resilient
  • It allows for a familiar development mode of bioinformaticians (notebooks) with benefits of integrated version control and collaborative authoring features. For more on notebooks see below.
  • Infrastructure is maintained in a fully consistent way
  • Single place for working efficiently with ML models according to best practices

Transformative impact: Achieving faster, more cost-efficient research and better science

“The data transformation that used to take up so much time and compute now happens at the flick of a switch, you could say. The complexity for the Inari scientists is much reduced, and ML analysis runs much, much faster,” says Frieden.

As part of the implementation of the platform Crayon ran three Inari projects on the new platform. “It was difficult to compare the Crayon results with former outcomes because we did not run our projects in a uniform way. The problem of standardized comparisons was actually one of the reasons why we needed the Crayon platform in the first place!”

“However, there is no doubt that we are seeing extraordinary gains in speed,” Frieden adds. “One ‘clean’ like-for-like analysis saw the Crayon platform complete an experiment that used to take us 24 hours in as little as 20 minutes.

“Not only can we go faster, it’s at a considerably lower cost because the platform auto-scales: it allocates resources based on required processing needs, thus optimizing resource utilization. Pre-Crayon, you’d have someone up a server, perform a task, spin down the server, go over to the ML platform, take what they had, and put it there. And every step racked up the cost of the compute. Today we're able to have one pipeline that runs between all these steps.”

The gains in speed may not help Inari produce its step-change crop seeds any faster – in agriculture, time-to-market is measured in years – but it does help speed up validation of ideas. “That is very relevant for us because it would mean we can file intellectual property around those ideas – and lock them in for Inari.”

For one of the projects, the Crayon platform performed an ML experiment that used to take us 24 hours to complete in just 20 minutes.

Alex Frieden, Director of Engineering, Inari

Frieden believes the Crayon platform has delivered not only faster and less expensive research but also better science. “Previously, our experiments lacked visibility. And because they were run inconsistently, with data that wasn’t properly traceable, analyzing the results could be frustrating,” he says.

“What we have now is shared notebooks. Databricks is really a big data processing platform, supporting interaction using well-known Jupyter notebook types of interaction. What’s in the notebooks? Comments about what you did, the analysis or plotting of the work that happened, so it becomes possible to talk about it.

“That’s where the science really happens in Inari – in these notebooks.”
Inari scientists are currently self-starting projects on the Crayon platform.

“There’s been a learning curve, but overall we have been able to work with needed changes and make supporters of the platform,” says Frieden.

“Recently, a scientist who really did not want to change the way they worked did a complete 180° after running one of their own projects on the Crayon platform. They are now a huge supporter.”

In conclusion, Frieden highlights the easy collaboration with Crayon. “Arguably we could have done this in-house,” he says. “But we had maybe one data engineer and one ML engineer to do it. With Crayon, I got five experts overnight!

“The Crayon AI team has made a real contribution to our work in pursuit of a more sustainable future for our farmers, and our planet. That’s the thing about Machine Learning – ultimately, it’s about the real world, and how we can change it for the better.”

About Inari Agriculture

Inari Agriculture is a SEEDesign company based out of Cambridge, Massachusetts. Its ambition is to develop next-generation seeds that help transform the sustainability of the global food system. The company has set itself the bold and explicit targets of achieving increases in corn, wheat and soybean yields of as much as 10-20%, while reducing nitrogen fertilizer and water use in corn by 40%. The company’s technology platform taps into the immense biological complexity of plants through the latest innovations in genomics, multiplex gene editing, and artificial intelligence. However, its Machine Learning analysis was cumbersome, fragmented, and costly. The company collaborated with Crayon to create a Data and Machine Learning platform to run controlled, reproducible data pipelines and ML experiments that yield ML models as reusable artifacts.