Friday, March 6, 2026

From the Field to the Cloud: How I Built SeedOps Savant for Corteva Agriscience on Azure AI Foundry

The Podcast Moment That Started It All

I recently had the privilege of joining Matthew Calder and Charles Maxson on the Microsoft Dev Radio podcast — one of the most exciting conversations I've had about enterprise AI in agriculture. We dug deep into a solution I architected called SeedOps Savant, an Azure AI Foundry–powered platform built for Corteva Agriscience, one of the world's leading agricultural science companies. If you haven't watched it yet, check out the full live stream here and come back — this post gives you the full behind-the-scenes story.


What Is SeedOps Savant?

SeedOps Savant is an enterprise-grade AI solution designed to bring intelligent, conversational access to seed operations data at Corteva. In an industry where seed production decisions can hinge on real-time field intelligence, agronomic research, and supply chain data, having a simple chat interface that synthesizes all of that context into an actionable answer is a game changer.

The name says it all: Seed Operations meet Savant — a system smart enough to serve agronomists, sales reps, and operational teams with fast, precise, grounded answers without digging through endless reports, spreadsheets, or documents.


Why Azure AI Foundry?

Azure AI Foundry is Microsoft's integrated platform for building, orchestrating, and managing enterprise AI solutions — from model selection and fine-tuning to deployment and observability. For SeedOps Savant, it was the natural choice for several reasons

  • End-to-end AI lifecycle management — I could go from model selection to deployment without stitching together disparate services

  • Enterprise security & governance — Corteva's data required strict access controls and data residency compliance that Azure AI Foundry handles natively

  • Seamless integration with the Azure ecosystem — connecting Azure AI Search for RAG, Azure OpenAI for generation, and Databricks for the data lakehouse was straightforward

  • Observability and monitoring — production-grade telemetry came built-in, critical for an enterprise rollout at scale

Agriculture is increasingly becoming a data-intensive industry, and Corteva is no exception — the company has built AI systems processing millions of data points across seeds, soil, weather, and genetics. SeedOps Savant needed to sit on top of that complexity and make it accessible.seedworld+1


The Architecture: RAG Meets AgriData

At its core, SeedOps Savant is a Retrieval-Augmented Generation (RAG) solution. Here's how the key layers fit together:

1. Data Foundation
Seed operations data — production schedules, agronomic research, product specifications, field trial outcomes — lives across multiple systems. We unified on Azure platform, bringing unstructured data together in a form that can be indexed and vector retrieved.

2. Intelligent Indexing with Azure AI Search
The heart of any RAG solution is its index. Azure AI Search provides hybrid search (keyword + vector), semantic ranking, and the ability to incorporate Corteva's proprietary data in a secure, governed way. This means that when a user asks a question, the retrieval step pulls back the most relevant context — not just keyword-matched documents.

3. Generative Answers via Azure OpenAI
Once the right context is retrieved, Azure OpenAI generates a grounded, human-readable response. The key here is grounded — SeedOps Savant doesn't hallucinate answers from its training data. Every response is anchored to Corteva's actual operational data.

4. Orchestration via Azure AI Foundry
Azure AI Foundry ties the prompt flow, model routing, and agent logic together, allowing the solution to handle complex multi-step queries — the kind that seed ops teams actually ask in the real world


The Real-World Impact

The agricultural AI space is exploding. SeedOps Savant brings that same paradigm to Corteva's seed operations specifically — giving the teams closest to production decisions fast access to enterprise knowledge.

For sales reps and agronomists in the field, having a system that can synthesize research, product data, and operational context into a single conversational interface isn't just a convenience — it's a competitive differentiator.agfundernews+1


Lessons Learned: Building Enterprise AI in AgriTech

A few key takeaways from building SeedOps Savant that I shared on the podcast:

  • Data quality is the foundation — no matter how powerful your LLM or search index, garbage in means garbage out. Invest early in data curation and governance.

  • Domain specificity matters — generic AI models need to be grounded in domain-specific data to be genuinely useful to agronomists and seed ops professionals.

  • Security and access control aren't optional — in enterprise agriculture, data is highly proprietary. Azure AI Foundry's built-in governance and role-based access made it possible to deploy with confidence.

  • Start with the user's workflow — the most impactful RAG solutions I've built are designed around how people actually work, not how the technology wants them to work.

  • Hybrid search wins — pure vector search is not enough for enterprise RAG. Combining semantic vector search with keyword search and re-ranking delivers meaningfully better results for domain-specific queries.


Watch the Full Podcast

If you want to hear me walk through the full story — the architecture decisions, the challenges of enterprise-scale RAG, and what's next for AI in agricultural operations — watch the Microsoft Dev Radio episode live below:

🎥 Watch on YouTube →


What's Next?

SeedOps Savant is one chapter in a much larger story about how Azure AI Foundry is enabling enterprise-grade AI solutions across industries that were previously underserved by technology. I'm actively documenting patterns, architectures, and implementation strategies like this in my upcoming book on Enterprise RAG with Azure technologies.

If you're building something similar — whether in agriculture, manufacturing, or any data-intensive enterprise — I'd love to connect. Drop a comment below or reach out directly.


Mehul Bhuva is a Senior Enterprise Architect, Microsoft Azure Developer Influencer, and author of an upcoming book on Enterprise RAG. He writes at sharepointfix.com.


Tuesday, December 16, 2025

Databricks Delta Sharing (D2O) with Open Delta Sharing – A Practical, Step‑by‑Step Guide for Data Engineers

Data products only create value when they can be shared and consumed easily, securely, and at scale. Delta Sharing was designed exactly for that: an open, cross‑platform protocol that lets you share live data from your databricks lakehouse with any downstream tool or platform over HTTPS, without copies or custom integrations.

In this blog post, I walk through Databricks‑to‑Open (D2O) Delta Sharing using Open Delta Sharing in a practical, step‑by‑step way. The focus is on helping data teams move from theory to a concrete implementation pattern that works in real projects.

What the article covers:

  • How Delta Sharing fits into a modern data collaboration strategy and when to choose Open Sharing (D2O) over Databricks‑to‑Databricks (D2D).
  • The core workflow: creating recipients, configuring authentication (bearer token or federated/OIDC), defining shares in Unity Catalog, and granting access to tables and views.
  • How external consumers can connect using open connectors (Python/pandas, Apache Spark, Power BI, Tableau, Excel and others) without needing a Databricks workspace.
  • Security, governance, and operational considerations such as token TTL, auditing activity, and avoiding data duplication by sharing from your existing Delta Lake and Parquet data.
  • Whether you are building a data‑as‑a‑service offering, exposing governed data products to partners, or just trying to simplify ad‑hoc external access, D2O can significantly reduce friction and integration work

Here is a step-by-step guide to Databricks Delta Sharing using Open Delta Sharing (D2O).

1. Create Recipient

2. Create Delta Share and assign Recipients


You can create a OIDC Federation Authentication or Token based authentication for your recipients.

Tables with RLS and column masks cannot be shared using Delta Sharing.



Select Recipient you had created prior. 



Additional Information:







Thursday, December 11, 2025

Databricks Training Notes - Compute

All purpose compute -R/W/X - More expensive

Serverless version of all purpose compute

All purpose is also known as Classic Compute.

Classic Compute - VMs, Databricks Consumption DBU/hr.

Job Compute - R/X - Cheaper

Serverless version of Job Compute

You can't run Scala/R on Serverless compute.

Serverless DBU cost is higher as VM is in-built into it.

RDD - Resilient, Dataset, Distributed

Worker dies, it can recreate data partition and keep running. RDD keeps extra RAM available.

Vector Search - Word embeddings. Array of floats. Specialized engine to build index of those numbers.

Pools - Pool of VMs that you need to be paying for. Classic compute scenario. Pools have gone away.

Serverless Compute - cheaper version

Serverless Compute - performance optimized version - usually 5 seconds

Cluster - Drivers and Worker Nodes. Single node cluster - driver is the worker. SkLearn, Pandas consume driver memory.

Use Job or Serverless clusters in production. Avoid interactive clusters in prod. Enable Photon for faster and cheaper execution. Reuse clusters to reduce startup time and cost.

Serverless - Photon engine, Predictive IO, Intelligent Workload Management

Pro - Photon, Predictive IO

Classic - Photon engine

Performance considerations: SKEW/SPILL/STORAGE/SHUFFLE/SERIALIZATION 

Adaptive Query Execution helps code optimization

Row Filter:

CREATE OR REPLACE FUNCTION device_filter(device_id INT)
  RETURN IF(IS_ACCOUNT_GROUP_MEMBER('admin'), true, device_id < 30);

ALTER TABLE silver
SET ROW FILTER device_filter ON (device_id);

SELECT *
FROM silver
ORDER BY device_id DESC;


SELECT
  *,
  cast(from_unixtime(user_first_touch_timestamp/1000000) AS DATE) AS first_touch_date
FROM read_files(
  "/Volumes/dbacademy_ecommerce/v01/raw/users-historical",
  format => 'parquet')
LIMIT 10;

Thursday, April 17, 2025

Setting Up AI Foundry with ChatGPT and RAG-Based Chat: A Comprehensive Guide

Introduction

In the rapidly evolving landscape of artificial intelligence, setting up an efficient and scalable AI system is crucial for businesses looking to leverage the power of AI. This blog post will guide you through the process of setting up AI Foundry using the ChatGPT model and implementing a Retrieval-Augmented Generation (RAG) based chat approach.

What is AI Foundry?

AI Foundry is a comprehensive platform provided by Azure that allows you to design, customize, and manage AI applications at scale. It offers a unified SDK, access to over 200 Azure services, and more than 1,800 models, making it a powerful tool for building AI-driven applications.

Understanding ChatGPT

ChatGPT, developed by OpenAI, is a conversational AI model that interacts in a dialogue format. It can answer follow-up questions, admit mistakes, and reject inappropriate requests. This model is trained using Reinforcement Learning from Human Feedback (RLHF), making it highly effective for generating coherent and contextually relevant responses.

What is RAG-Based Chat?

Retrieval-Augmented Generation (RAG) is an architecture that enhances the capabilities of a Large Language Model (LLM) like ChatGPT by integrating an information retrieval system. This system provides grounding data, ensuring that the AI's responses are accurate and relevant. RAG is particularly useful for enterprise solutions, as it allows the AI to access and utilize proprietary content.

Step-by-Step Guide to Setting Up AI Foundry with ChatGPT and RAG

  1. Prerequisites

    • Azure account with access to AI Foundry.
    • OpenAI API key for ChatGPT.
    • Basic understanding of Python and Azure services.
  2. Setting Up AI Foundry

    • Sign In: Log into your Azure account and navigate to AI Foundry.
    • Create a New Project: Start a new project and select the necessary services and models.
    • Configure SDK: Install the AI Foundry SDK and set up your development environment.
    pip install azure-ai-foundry
     
  3. Integrating ChatGPT

    • API Access: Obtain your OpenAI API key and integrate it into your project.
    • Model Configuration: Configure the ChatGPT model within AI Foundry.
    import openai
    openai.api_key = 'your-api-key'
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "How can I set up AI Foundry?"}
        ]
    )
    print(response.choices[0].message["content"])

  4. Implementing RAG-Based Chat

    • Data Retrieval System: Set up Azure AI Search to index and retrieve relevant data.
    • Integration with ChatGPT: Combine the retrieval system with ChatGPT to enhance response accuracy.

      from azure.ai.search import SearchClient
      from azure.core.credentials import AzureKeyCredential

      search_client = SearchClient(endpoint="your-search-endpoint", credential=AzureKeyCredential("your-key"))

      def retrieve_data(query):
          results = search_client.search(query)
          return results

      def generate_response(query):
          data = retrieve_data(query)
          response = openai.ChatCompletion.create(
              model="gpt-4",
              messages=[
                  {"role": "system", "content": "You are a helpful assistant."},
                  {"role": "user", "content": query},
                  {"role": "assistant", "content": data}
              ]
          )
          return response.choices[0].message["content"]

      print(generate_response("Tell me about AI Foundry"))

  5. Testing and Deployment

    • Evaluation: Test the system using ground truth data to ensure coherence and relevance.
    • Deployment: Deploy your AI application using Azure's scalable infrastructure.

Conclusion

Setting up AI Foundry with ChatGPT and implementing a RAG-based chat approach can significantly enhance the capabilities of your AI applications. By following this guide, you can create a robust and scalable AI system that leverages the latest advancements in AI technology.

Saturday, January 11, 2025

How I Successfully Passed the DP-203 Azure Data Engineer Associate Certification

The journey to obtaining the DP-203 Azure Data Engineer Associate certification is both challenging and rewarding. I'd like to share my experience and preparation strategy that led me to success.

Understanding the Certification

Before diving into the study material, it's essential to understand the certification itself. The DP-203 certification focuses on implementing and designing data solutions on Microsoft Azure, including Azure Synapse Analytics, Azure Data Lake Gen2, Azure Data Factory, Azure Databricks, Azure Stream Analytics and Azure Event Hubs among others. It's crucial to be familiar with the core services and tools offered by Azure to effectively prepare for the exam.

My Preparation Material

  1. Microsoft Learning: Microsoft Learning offers a range of resources, including free online modules, videos, and documentation that cover the exam's topics comprehensively. I found these resources extremely helpful to build a strong foundation and understand the concepts deeply.

  2. Practice Exams: Taking practice exams was one of the most crucial parts of my preparation. They not only helped me assess my knowledge but also familiarized me with the exam's format and types of questions. It’s a great way to identify areas that need improvement and get comfortable with the timing.

  3. Pluralsight Course: I enrolled in the Pluralsight course "DP-203: Processing Data in Azure Using Batch Solutions" https://www.pluralsight.com/cloud-guru/courses/dp-203-processing-in-azure-using-batch-solutions This course provided in-depth coverage of the exam's topics, with practical examples and hands-on labs. The interactive approach and clear explanations made complex concepts easier to understand.

Study Plan and Tips

Creating a study plan and sticking to it is essential. Here's what worked for me:

  • Set Clear Goals: Break down the topics and set weekly goals to cover specific modules or sections. This approach makes the vast syllabus more manageable.

  • Practice Regularly: Consistently work on practice questions and labs to reinforce your learning. Practical application is key to mastering the material.

  • Join Study Groups: Engage with study groups or online forums to discuss challenging topics, share resources, and gain different perspectives. The Azure community is very supportive and can provide valuable insights.

  • Review and Revise: Regularly review the topics you’ve covered to retain the information. Summarizing what you've learned in your own words can be an effective revision strategy.

Exam Day

On the day of the exam, make sure to:

  • Get a good night's sleep before the exam day.

  • Have all necessary identification and materials ready.

  • Stay calm and focused during the exam.

Final Thoughts

Passing the DP-203 Azure Data Engineer Associate certification requires dedication, consistency, and the right resources. By leveraging Microsoft Learning, practice exams, and the Pluralsight course, I was able to build a solid understanding and practical skills that helped me succeed. Remember, it's not just about passing the exam but gaining valuable knowledge that will benefit your career in data engineering.

Good luck to everyone on their certification journey! 🚀