Skip to content

Prototyping datagraphs using Zod, GQLoom and GenAI

Published: at 11:00 PM

In this article I will show how you can quickly create and iterate on graphQL API with any structure, without having to spend time writing resolvers and integrating backends. This is an experimental approach, and is mostly interesting for people who want to iterate on the graph structure of the graphQL api itself, while having having a realistic response available.

The catch? It might cost you some credits on openAI, because nothing that saves you time comes for free ;).

Plant schemas

I will use my motivation as use case for this article. My goals is to create an open api for botanical information, yes plants! So basically, given any plant name, figure out all the structured information that we can find about a given plant. The more structure we can create in the response, the better we can turn this into an api. For structure think about fields like:

Being unfamiliar with the “botanical domain” who can best help me explore what types of fields I can include in my api? That’s right, let’s ask ChatGPT an example, using one of my favourite plants, the Coffea Arabica species ☕:

Can you give me all information about the coffea Arabica plant species? Return as json as if you are a plant information api

🤖 Response
{
  "species": "Coffea arabica",
  "common_name": "Arabica coffee",
  "classification": {
    "kingdom": "Plantae",
    "order": "Gentianales",
    "family": "Rubiaceae",
    "genus": "Coffea",
    "species": "C. arabica"
  },
  "description": {
    "general": "Coffea arabica is a flowering evergreen shrub or small tree in the Rubiaceae family. It is cultivated primarily for its beans, which are widely used to produce coffee.",
    "height": "Typically grows to 2.5–4.5 meters (8–15 feet) in cultivation, but can reach up to 6 meters (20 feet) in the wild.",
    "leaves": "Glossy, dark green, oval-shaped, and typically around 6–12 cm (2.4–4.7 inches) in length.",
    "flowers": "White, star-shaped, fragrant flowers that bloom in clusters, each flower lasting only a few days. They are typically self-pollinating.",
    "fruit": {
      "type": "Drupe",
      "description": "The fruit, known as a coffee cherry, is small and typically red or purple when ripe, containing two seeds (coffee beans).",
      "size": "Approximately 10–15 mm in diameter."
    },
    "root_system": "Shallow root system, which makes it vulnerable to drought but responsive to soil moisture levels."
  },
  "origin_and_distribution": {
    "origin": "Native to the highlands of Ethiopia and South Sudan.",
    "natural_habitat": "Grows in tropical highland forests with consistent, mild temperatures and relatively high humidity.",
    "current_distribution": "Widely cultivated in tropical regions worldwide, especially in Latin America, East Africa, Asia, and the Arabian Peninsula."
  },
  "growth_conditions": {
    "climate": {
      "temperature": "Optimal range of 15–24°C (59–75°F), sensitive to frost.",
      "humidity": "Prefers high humidity, especially in its natural rainforest habitat.",
      "sunlight": "Partial shade to full sun, though it thrives under dappled light in natural settings."
    },
    "soil": {
      "type": "Well-draining, slightly acidic soils rich in organic matter.",
      "pH": "Prefers a pH range of 6.0–6.5."
    },
    "water": "Requires regular moisture, though over-watering can lead to root rot. Moderate drought tolerance due to shallow roots."
  },
  "botanical_characteristics": {
    "growth_rate": "Moderate, with plants maturing for full production around 3–4 years after planting.",
    "longevity": "Typically produces for 20–30 years, though some plants can remain productive for up to 50 years.",
    "pollination": "Primarily self-pollinating, though cross-pollination can occur under certain conditions.",
    "flowering": "Typically blooms annually, with flowers opening after the first significant rains in its growing season."
  },
  "agricultural_information": {
    "economic_importance": "Primary species of coffee cultivated globally, accounting for approximately 60-70% of the world's coffee production.",
    "harvesting": "Coffee cherries are typically hand-picked in selective harvests to ensure ripeness.",
    "yield": "Average yields range from 0.5 to 1.2 kg (1 to 2.6 lbs) of coffee per tree per year, depending on conditions and cultivation practices."
  },
  "biological_information": {
    "genetics": {
      "ploidy": "Tetraploid",
      "genome_size": "Approximately 710 Mbp",
      "genetic_variation": "Coffea arabica is known for its low genetic diversity, making it susceptible to diseases and environmental stress."
    },
    "pests_and_diseases": [
      {
        "name": "Coffee Leaf Rust",
        "type": "Fungal disease",
        "effect": "Causes yellowing and defoliation of leaves, significantly reducing yields."
      },
      {
        "name": "Coffee Berry Disease",
        "type": "Fungal disease",
        "effect": "Causes dark lesions on developing berries, reducing crop yield and quality."
      },
      {
        "name": "Coffee Berry Borer",
        "type": "Insect pest",
        "effect": "Larvae burrow into coffee cherries, damaging beans and impacting quality."
      }
    ]
  },
  "ecological_information": {
    "role_in_ecosystem": "Coffea arabica serves as an understory plant in forest ecosystems, providing habitat and food for insects, birds, and mammals.",
    "environmental_impact": "Commercial coffee cultivation can lead to deforestation, soil degradation, and biodiversity loss if not managed sustainably."
  },
  "conservation_status": {
    "IUCN_status": "Vulnerable",
    "threats": "Habitat loss, climate change, and susceptibility to diseases due to limited genetic diversity.",
    "conservation_efforts": "Efforts include gene banking, breeding programs for disease resistance, and promoting shade-grown coffee to maintain biodiversity."
  }
}

As you can see, chatGPT suggests a schema since we asked it to respond “like an API”. Let’s do one additional step and ask it to convert this into a Zod schema. I will explain why Zod next.

Enter GQLoom

So we have a rough structure for our plant api, converted into a Zod schema, a part of it shown below:

export const Plant = z
  .object({
    __typename: z.literal("Plant").nullish(),

    species: z.string().describe("The scientific name of the species."),

    common_name: z.string().describe("The common name of the plant species."),

    classification: z
      .object({
        __typename: z.literal("Classification").nullish(),
        kingdom: z.string(),
        order: z.string(),
        family: z.string(),
        genus: z.string(),
        species: z.string(),
      })
      .strict()
      .describe("Taxonomic classification of the species."),

So where to go from here? Meet GQLoom, a meta-framework tying code-first graphql generation to a variety of other tools, allowing developers to create graphql schemas out of schemas from other libraries (like Zod, Yup or Valibot). For this purpose, we’ll use Zod, since we already have a working definition of our data schema in Zod, thanks to chatGPT (more of that later).

So in order to create a graphQL server with a resolver based on Zod, you can follow this tutorial. What you’ll end up is something like this:

import { weave, resolver, query } from "@gqloom/zod";
import { z } from "zod";
import { createServer } from "node:http";
import { createYoga } from "graphql-yoga";
import { Plant } from "./models/plant";
import { getPlantInformation } from "../scripts/openai";

const PlantResolver = resolver({
  plant: query(Plant.nullish(), {
    input: {
      name: z.string().min(3),
    },
    resolve: async ({ name }) => {
      return getPlantInformation(name);
    },
  }),
});

const TestResolver = resolver({
  hello: query(z.string(), () => "Hello, World"),
});

export const schema = weave(TestResolver, PlantResolver);

const yoga = createYoga({ schema });
createServer(yoga).listen(4000, () => {
  console.info("Server is running on http://localhost:4000/graphql");
});

This will spin up a new graphql schema, completely based on the Zod schema that sits in the models directory. Great right? Now we could stop here and return mock data from getPlantInformation function, but why stop here?

You know what api also works well with Zod? openAI 🌟

Structured genAI responses

Let’s try and implement the getPlantInformation call from the resolver above. For this, we’ll create an API token for the OpenAI completions API, and use it in the following snippet:

import OpenAI from "openai";
import { z } from "zod";

import { zodResponseFormat } from "openai/src/helpers/zod.js";
import { Plant, PlantDisease } from "../src/models/plant";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export const getPlantInformation = async (species: string) => {
  const { choices } = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content: [
          {
            type: "text",
            text: "You are a plant api. Return all information you have on a provided plant species name formatted as json. If the user provided value is not a known plant species, throw an error.",
          },
        ],
      },
      {
        role: "user",
        content: [
          {
            type: "text",
            text: species,
          },
        ],
      },
    ],
    response_format: zodResponseFormat(Plant, "plant"),
    temperature: 1,
    max_tokens: 2048,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,
  });

  const content = choices[0].message.content as string;
  return JSON.parse(content) as any as z.infer<typeof Plant>;
};

This snippet uses the cheaper gpt-4o model, which makes each call just a few cents. The magic is happening in the following line: response_format: zodResponseFormat(Plant, "plant"),.

This line basically forces the openAI response format to be exactly according to the Zod schema which we also use for our graphql schema! In other words, if we change the zod schema for our plants api and reload, both the graphql schema is updated and the response is automatically updated as well! A single source of truth, just a few lines of code, and a fully working graphQL schema on any given topic. For this example I chose the botanic domain, but you can imagine to create graphql servers for any specific domain information that might be covered by genAI.

Next steps

In this article you’ve seen how GQLoom, Zod and OpenAI work together to quickly create a graphql server serving real data with a single source of truth in Zod.

This is mostly a proof of concept, and I just want to reiterate that this is useful in a design phase, but not intended for production!

There’s much more to do here, which might be worth writing about in a follow up. For example, each request costs money with this setup, so you might want to cache or save these responses if the intent for them is to be static.

Additionally, the zod model can be broken into individual pieces, and these can be served from different root resolvers. This is exciting since it builds on the benefits of a Graph-first approach in a highly structured and connected domain, such as botanical information. To give you an example, Each plant can have many different diseases, but each disease is not limited to an individual species. Therefore querying from Disease -> Plant is an interesting resolver, while reusing the data strcuture for both Disease and Plant.


Next Post
Moved blog from Hugo to Astro