Contents
  1. How the generation worked
  2. VGG-16 Convolutional Neural Network
  3. Support Vector Machine
  4. Viterbi Algorithm
  5. Backpropagation
  6. Hidden Markov Model structure
  7. ML training pipeline
  8. Transformer architecture
  9. Microservices architecture
  10. ML training loop sequence
  11. Gradient descent flowchart
  12. Kalman filter
  13. Decision tree
  14. What held up and what did not
← All posts

What Claude-Generated Diagrams Actually Look Like Across Four Tools

I used Claude to generate diagram source code across Graphviz, D2, Pikchr, and Mermaid for the same ML subjects. This is what came out, what held up, and what broke.

The previous post covers the toolchain. This one is about using Claude to generate the actual diagram source, what that process looked like, and what the rendered outputs are.

The short version: Claude generates syntactically correct diagram code reliably. The quality of what it generates, meaning whether the layout is readable, whether the annotations are accurate, whether the diagram actually communicates the concept, varies a lot by tool and by subject.


How the generation worked

For each diagram I gave Claude the subject and the tool. For ML algorithm diagrams (CNN, SVM, Viterbi, backpropagation) I also gave the mathematical context. For infrastructure diagrams I described the architecture.

Claude produced complete source files. I fed them to the renderer, looked at the output, and iterated. Most diagrams took two or three rounds. A few took more. The Pikchr SVM diagram, which required placing scatter points at specific coordinates with a diagonal hyperplane at the right angle, took the most back-and-forth, because Pikchr requires exact geometry and there is no layout engine to fall back on.


VGG-16 Convolutional Neural Network

The CNN architecture is a natural fit for Graphviz. It is a pure DAG, one direction, layered blocks, no geometry needed. Claude generated a clean DOT file on the first try. The subgraph clusters map directly onto conv blocks, parameter counts go in the labels, and the dot engine handles all placement.

digraph CNN {
  rankdir=TB
  fontname="monospace"
  bgcolor="#fafafa"
  nodesep=0.35
  ranksep=0.7
  label="VGG-16 Architecture — Convolutional Neural Network"
  labelloc=t

  node [fontname="monospace" fontsize=9 style=filled shape=rect]
  edge [fontname="monospace" fontsize=8 color="#555555"]

  node [fillcolor="#dbeafe"]
  input [label="INPUT  |  224 × 224 × 3  |  RGB image"]

  subgraph cluster_b1 {
    label="Block 1 — 36,928 params"
    style=filled fillcolor="#f0fdf4" color="#86efac"
    node [fillcolor="#bbf7d0"]
    b1c1 [label="Conv2D  |  64 filters  |  3×3  |  ReLU  |  224×224×64"]
    b1c2 [label="Conv2D  |  64 filters  |  3×3  |  ReLU  |  RF: 3×3"]
    b1p  [label="MaxPool  |  2×2, stride=2  |  112×112×64"]
    b1c1 -> b1c2 -> b1p
  }
  subgraph cluster_b2 {
    label="Block 2 — 221,184 params"
    style=filled fillcolor="#eff6ff" color="#93c5fd"
    node [fillcolor="#bfdbfe"]
    b2c1 [label="Conv2D  |  128 filters  |  3×3  |  ReLU  |  112×112×128"]
    b2c2 [label="Conv2D  |  128 filters  |  3×3  |  ReLU  |  RF: 7×7"]
    b2p  [label="MaxPool  |  2×2, stride=2  |  56×56×128"]
    b2c1 -> b2c2 -> b2p
  }
  subgraph cluster_b3 {
    label="Block 3 — 1,475,584 params"
    style=filled fillcolor="#fff7ed" color="#fdba74"
    node [fillcolor="#fed7aa"]
    b3c1 [label="Conv2D  |  256 filters  |  3×3  |  ReLU  |  56×56×256"]
    b3c2 [label="Conv2D  |  256 filters  |  3×3  |  ReLU"]
    b3c3 [label="Conv2D  |  256 filters  |  3×3  |  ReLU  |  RF: 16×16"]
    b3p  [label="MaxPool  |  2×2, stride=2  |  28×28×256"]
    b3c1 -> b3c2 -> b3c3 -> b3p
  }
  subgraph cluster_b4 {
    label="Block 4 — 5,899,264 params"
    style=filled fillcolor="#fdf4ff" color="#d8b4fe"
    node [fillcolor="#e9d5ff"]
    b4c1 [label="Conv2D  |  512 filters  |  3×3  |  ReLU  |  28×28×512"]
    b4c2 [label="Conv2D  |  512 filters  |  3×3  |  ReLU"]
    b4c3 [label="Conv2D  |  512 filters  |  3×3  |  ReLU  |  RF: 40×40"]
    b4p  [label="MaxPool  |  2×2, stride=2  |  14×14×512"]
    b4c1 -> b4c2 -> b4c3 -> b4p
  }
  subgraph cluster_b5 {
    label="Block 5 — 7,079,424 params"
    style=filled fillcolor="#fff1f2" color="#fda4af"
    node [fillcolor="#fecdd3"]
    b5c1 [label="Conv2D  |  512 filters  |  3×3  |  ReLU  |  14×14×512"]
    b5c2 [label="Conv2D  |  512 filters  |  3×3  |  ReLU"]
    b5c3 [label="Conv2D  |  512 filters  |  3×3  |  ReLU  |  RF: 88×88"]
    b5p  [label="MaxPool  |  2×2, stride=2  |  7×7×512"]
    b5c1 -> b5c2 -> b5c3 -> b5p
  }
  subgraph cluster_head {
    label="Classifier Head — 123,642,856 params"
    style=filled fillcolor="#fffbeb" color="#fbbf24"
    node [fillcolor="#fef3c7"]
    flat [label="Flatten  |  25,088 units"]
    fc1  [label="FC-4096  |  ReLU  |  Dropout 0.5  |  102.8M params"]
    fc2  [label="FC-4096  |  ReLU  |  Dropout 0.5  |  16.8M params"]
    fc3  [label="FC-1000  |  ImageNet classes  |  4.1M params"]
    soft [label="Softmax  |  1000-dim  |  Σ pᵢ = 1.0"]
    flat -> fc1 -> fc2 -> fc3 -> soft
  }

  edge [penwidth=1.5]
  input -> b1c1
  b1p -> b2c1
  b2p -> b3c1
  b3p -> b4c1
  b4p -> b5c1
  b5p -> flat
}
VGG-16 CNN — Graphviz
VGG-16 architecture, Graphviz dot. Five conv blocks plus classifier head.

The parameter counts in the labels are accurate. Claude computed them correctly, including the receptive field annotations at each block boundary. This is the kind of detail where a human drawing the same diagram by hand would likely skip or round.

The D2 version of the same diagram is more verbose but renders well. The Mermaid and Pikchr versions exist too. None of them add information the Graphviz version does not have. For a pure DAG like this, Graphviz is the right tool and produces the cleanest result.


Support Vector Machine

The SVM diagram is where the tool choice matters most. The diagram needs a diagonal hyperplane at a specific angle, scatter points in two classes positioned relative to the hyperplane, support vectors annotated, margin width indicated with a two-headed arrow, and reference boxes for the optimisation objective and kernels. That is a geometric drawing problem.

# SVM: Hard and Soft Margin Classification

scale = 1.0

text "SVM — Support Vector Machine: Geometry, Margins, Kernels" bold at 3.8,9.6

arrow from 0.3,0.4 to 5.8,0.4
text "x₁  (feature 1)" small at 6.15,0.4
arrow from 0.3,0.4 to 0.3,5.2
text "x₂" small with .s at 0.3,5.35

line from 1.3,0.35 to 1.3,0.45
line from 2.3,0.35 to 2.3,0.45
line from 3.3,0.35 to 3.3,0.45
line from 4.3,0.35 to 4.3,0.45
line from 0.25,1.4 to 0.35,1.4
line from 0.25,2.4 to 0.35,2.4
line from 0.25,3.4 to 0.35,3.4
line from 0.25,4.4 to 0.35,4.4

line from 0.9,0.5 to 4.9,4.9 color 0x1e40af thick
line from 0.2,0.5 to 3.8,4.7 dashed color 0x3b82f6
line from 1.8,0.5 to 5.8,4.7 dashed color 0x3b82f6

text "w·x+b=0" small bold with .w at 5.0,5.1
text "w·x+b=+1" small italic with .e at 3.55,4.85
text "w·x+b=−1" small italic with .w at 6.0,4.85

arrow from 2.8,2.5 to 3.5,2.5
arrow from 3.4,2.5 to 2.7,2.5
text "2/||w||" small bold with .s at 3.1,2.7

circle radius 0.13 fill 0x86efac at 4.3,4.5
circle radius 0.13 fill 0x86efac at 4.8,4.0
circle radius 0.13 fill 0x86efac at 5.1,3.5
circle radius 0.13 fill 0x86efac at 4.6,3.2
circle radius 0.13 fill 0x86efac at 5.3,4.3
circle radius 0.13 fill 0x86efac at 4.9,2.8
circle radius 0.13 fill 0x86efac at 5.5,3.8

box width 0.22 height 0.22 fill 0xfca5a5 at 1.0,1.0
box width 0.22 height 0.22 fill 0xfca5a5 at 1.6,1.6
box width 0.22 height 0.22 fill 0xfca5a5 at 0.7,1.9
box width 0.22 height 0.22 fill 0xfca5a5 at 1.3,2.4
box width 0.22 height 0.22 fill 0xfca5a5 at 0.9,0.7
box width 0.22 height 0.22 fill 0xfca5a5 at 1.8,2.0
box width 0.22 height 0.22 fill 0xfca5a5 at 0.6,2.8

circle radius 0.18 fill 0x4ade80 color 0x166534 thickness 0.06 at 3.8,4.2
circle radius 0.18 fill 0x4ade80 color 0x166534 thickness 0.06 at 4.3,3.5
box width 0.28 height 0.28 fill 0xf87171 color 0x7f1d1d thickness 0.06 at 1.5,0.7
box width 0.28 height 0.28 fill 0xf87171 color 0x7f1d1d thickness 0.06 at 2.0,1.8

arrow from 4.8,4.6 to 4.0,4.28 color 0x166534
text "support vector" small italic with .w at 4.85,4.65
arrow from 0.6,0.5 to 1.38,0.68 color 0x7f1d1d
text "support vector" small italic with .e at 0.55,0.5

circle radius 0.14 fill 0xfbbf24 color 0x92400e thickness 0.05 at 2.5,1.2
arrow from 2.5,1.34 to 2.5,1.65 color 0x92400e
text "ξ > 0" small bold with .w at 2.65,1.5
text "(slack: inside margin)" small with .w at 2.65,1.28

box "min  ½||w||²  +  C Σξᵢ" small bold "s.t.  yᵢ(w·xᵢ+b) ≥ 1−ξᵢ,  ξᵢ≥0" small width 2.6 height 0.6 fill 0xfef9c3 color 0xca8a04 at 3.8,7.8
box "Kernels:  K(x,z) = φ(x)·φ(z)" small bold "Linear: x·z" small "Poly: (γx·z+r)^d" small "RBF: exp(−γ||x−z||²)" small width 2.6 height 0.8 fill 0xf3e8ff color 0xa855f7 at 3.8,6.8
box "Legend" bold "● Circle = positive (y=+1)" small "■ Square = negative (y=−1)" small "★ Larger border = support vector" small "✕ Yellow = slack ξᵢ > 0" small width 2.6 height 0.9 fill 0xf1f5f9 color 0x64748b at 3.8,5.7
SVM — Pikchr
SVM geometry, Pikchr. Axes, scatter plot, hyperplane at exact angle, margin annotation.

Claude handled the Pikchr geometry correctly after a couple of iterations. The initial version had the hyperplane at the wrong slope and the scatter points too tightly clustered. After adjusting the endpoint coordinates for the hyperplane line and spreading the class distributions, the diagram reads correctly.

The Graphviz SVM version uses neato with pinned pos= coordinates to approximate the same layout. It works, but you can see it fighting the tool. Pikchr is simply the right choice for this kind of figure.


Viterbi Algorithm

The Viterbi trellis is a grid: states as rows, time steps as columns, transition edges between every state pair at each step, with the optimal path highlighted. Graphviz handles this well with rankdir=LR and explicit rank=same groupings.

digraph Viterbi {
  rankdir=LR
  fontname="monospace"
  fontsize=12
  bgcolor="#fafafa"
  splines=line
  nodesep=0.55
  ranksep=1.5
  label="Viterbi Algorithm — Hidden Markov Model  (Ice-Cream Weather)\nStates: Hot / Warm / Cold    Observations: ice-cream count each day"
  labelloc=t

  node [fontname="monospace" fontsize=10 shape=circle style=filled width=0.85]
  edge [fontname="monospace" fontsize=7.5]

  node [shape=none style=invis width=0 height=0 fontsize=11]
  lbl [label="State"]
  t1 [label="t = 1"] t2 [label="t = 2"] t3 [label="t = 3"]
  t4 [label="t = 4"] t5 [label="t = 5"] t6 [label="t = 6"]
  lbl -> t1 -> t2 -> t3 -> t4 -> t5 -> t6 [style=invis]

  node [shape=rect style="filled,rounded" fillcolor="#fef9c3" fontsize=9 width=0.9 height=0.55]
  obs1 [label="O₁=3\n🍦🍦🍦"] obs2 [label="O₂=1\n🍦"]   obs3 [label="O₃=2\n🍦🍦"]
  obs4 [label="O₄=1\n🍦"]   obs5 [label="O₅=2\n🍦🍦"] obs6 [label="O₆=1\n🍦"]

  node [shape=circle style=filled width=0.85 fontsize=9]
  node [fillcolor="#fca5a5"] H1 [label="Hot\n δ=0.240\n b(3)=0.4"]
  node [fillcolor="#fde68a"] W1 [label="Warm\n δ=0.080\n b(3)=0.4"]
  node [fillcolor="#93c5fd"] C1 [label="Cold\n δ=0.020\n b(3)=0.1"]
  node [fillcolor="#fca5a5"] H2 [label="Hot\n δ=0.168\n b(1)=0.2"]
  node [fillcolor="#fde68a"] W2 [label="Warm\n δ=0.064\n b(1)=0.2"]
  node [fillcolor="#93c5fd"] C2 [label="Cold\n δ=0.028\n b(1)=0.5"]
  node [fillcolor="#fca5a5"] H3 [label="Hot\n δ=0.034\n b(2)=0.3"]
  node [fillcolor="#fde68a"] W3 [label="Warm\n δ=0.067\n b(2)=0.4"]
  node [fillcolor="#93c5fd"] C3 [label="Cold\n δ=0.020\n b(2)=0.2"]
  node [fillcolor="#fca5a5"] H4 [label="Hot\n δ=0.024\n b(1)=0.2"]
  node [fillcolor="#fde68a"] W4 [label="Warm\n δ=0.027\n b(1)=0.2"]
  node [fillcolor="#93c5fd"] C4 [label="Cold\n δ=0.033\n b(1)=0.5"]
  node [fillcolor="#fca5a5"] H5 [label="Hot\n δ=0.007\n b(2)=0.3"]
  node [fillcolor="#fde68a"] W5 [label="Warm\n δ=0.013\n b(2)=0.4"]
  node [fillcolor="#93c5fd"] C5 [label="Cold\n δ=0.016\n b(2)=0.2"]
  node [fillcolor="#fca5a5"] H6 [label="Hot\n δ=0.005\n b(1)=0.2"]
  node [fillcolor="#fde68a"] W6 [label="Warm\n δ=0.006\n b(1)=0.2"]
  node [fillcolor="#93c5fd" penwidth=3 color="#1d4ed8"] C6 [label="Hot\n δ=0.008\n b(1)=0.5"]

  { rank=same; lbl; obs1; H1; W1; C1 }
  { rank=same; t1;  obs2; H2; W2; C2 }
  { rank=same; t2;  obs3; H3; W3; C3 }
  { rank=same; t3;  obs4; H4; W4; C4 }
  { rank=same; t4;  obs5; H5; W5; C5 }
  { rank=same; t5;  obs6; H6; W6; C6 }

  edge [style=dotted color="#94a3b8" arrowhead=none penwidth=1.0]
  obs1 -> H1  obs1 -> W1  obs1 -> C1
  obs2 -> H2  obs2 -> W2  obs2 -> C2
  obs3 -> H3  obs3 -> W3  obs3 -> C3
  obs4 -> H4  obs4 -> W4  obs4 -> C4
  obs5 -> H5  obs5 -> W5  obs5 -> C5
  obs6 -> H6  obs6 -> W6  obs6 -> C6

  edge [color="#d1d5db" fontcolor="#6b7280" style=solid penwidth=0.8 arrowhead=vee]
  H1->H2 [label="0.70"] H1->W2 [label="0.20"] H1->C2 [label="0.10"]
  W1->H2 [label="0.30"] W1->W2 [label="0.40"] W1->C2 [label="0.30"]
  C1->H2 [label="0.10"] C1->W2 [label="0.30"] C1->C2 [label="0.60"]
  H2->H3 [label="0.70"] H2->W3 [label="0.20"] H2->C3 [label="0.10"]
  W2->H3 [label="0.30"] W2->W3 [label="0.40"] W2->C3 [label="0.30"]
  C2->H3 [label="0.10"] C2->W3 [label="0.30"] C2->C3 [label="0.60"]
  H3->H4 [label="0.70"] H3->W4 [label="0.20"] H3->C4 [label="0.10"]
  W3->H4 [label="0.30"] W3->W4 [label="0.40"] W3->C4 [label="0.30"]
  C3->H4 [label="0.10"] C3->W4 [label="0.30"] C3->C4 [label="0.60"]
  H4->H5 [label="0.70"] H4->W5 [label="0.20"] H4->C5 [label="0.10"]
  W4->H5 [label="0.30"] W4->W5 [label="0.40"] W4->C5 [label="0.30"]
  C4->H5 [label="0.10"] C4->W5 [label="0.30"] C4->C5 [label="0.60"]
  H5->H6 [label="0.70"] H5->W6 [label="0.20"] H5->C6 [label="0.10"]
  W5->H6 [label="0.30"] W5->W6 [label="0.40"] W5->C6 [label="0.30"]
  C5->H6 [label="0.10"] C5->W6 [label="0.30"] C5->C6 [label="0.60"]

  edge [color="#dc2626" penwidth=3.5 style=bold fontcolor="#dc2626" fontsize=9 arrowhead=vee]
  H1->H2 [label="BEST"] H2->W3 [label="BEST"]
  W3->C4 [label="BEST"] C4->C5 [label="BEST"] C5->C6 [label="BEST"]

  node [shape=rect style="filled,rounded" fillcolor="#f1f5f9" fontsize=8 width=2.2 height=0.9]
  init [label="Initial π:\nπ(Hot)=0.6  π(Warm)=0.2\nπ(Cold)=0.2"]
  node [shape=rect style="filled,rounded" fillcolor="#f1f5f9" fontsize=8 width=2.2 height=1.0]
  emis [label="Emission B (count→state):\nB(1|H)=0.2 B(2|H)=0.3 B(3|H)=0.4\nB(1|W)=0.2 B(2|W)=0.4 B(3|W)=0.4\nB(1|C)=0.5 B(2|C)=0.2 B(3|C)=0.1"]

  init -> H1 [style=dotted arrowhead=none color="#94a3b8"]
  emis -> C1 [style=invis]
  { rank=same; init; emis; H1; W1; C1 }
}
Viterbi trellis — Graphviz
Viterbi algorithm over 6 time steps, Graphviz dot. Optimal path in red. Delta probabilities and emission values annotated per cell.

The delta values at each trellis cell are correct. Claude computed them step by step using the HMM parameters. The optimal path (H H W C C C) matches what you get running the algorithm by hand.

One thing Claude did well here: the observation nodes are tied to each column using rank=same, which keeps the column structure readable even with all the crossing transition edges. That is a non-obvious Graphviz trick.


Backpropagation

The backprop diagram is dense. Forward pass edges go left to right in blue, backward pass edges go right to left in red dashed. Every layer is fully connected to the next, so the edge count is high.

digraph Backprop {
  rankdir=LR
  fontname="monospace"
  fontsize=12
  bgcolor="#fafafa"
  nodesep=0.45
  ranksep=1.8
  label="Backpropagation — MLP 3→4→3→2  |  Forward: blue  |  Backward: red dashed\nChain rule:  ∂L/∂W¹ = (∂L/∂h²)(∂h²/∂a²)(∂a²/∂h¹)(∂h¹/∂a¹)(∂a¹/∂W¹)"
  labelloc=t

  node [fontname="monospace" fontsize=9 style=filled shape=circle]
  edge [fontname="monospace" fontsize=8]

  subgraph cluster_in {
    label="Input\nLayer" fontname="monospace" fontsize=9
    style=filled fillcolor="#eff6ff" color="#93c5fd"
    node [fillcolor="#dbeafe" width=0.75]
    x1 [label="x₁\nfeat 1"] x2 [label="x₂\nfeat 2"] x3 [label="x₃\nfeat 3"]
  }
  subgraph cluster_h1 {
    label="Hidden Layer 1\na¹ = W¹x+b¹\nh¹ = ReLU(a¹)" fontname="monospace" fontsize=9
    style=filled fillcolor="#f0fdf4" color="#86efac"
    node [fillcolor="#bbf7d0" width=0.82]
    h11 [label="h₁¹\nReLU\na¹₁"] h12 [label="h₂¹\nReLU\na¹₂"]
    h13 [label="h₃¹\nReLU\na¹₃"] h14 [label="h₄¹\nReLU\na¹₄"]
  }
  subgraph cluster_h2 {
    label="Hidden Layer 2\na² = W²h¹+b²\nh² = ReLU(a²)" fontname="monospace" fontsize=9
    style=filled fillcolor="#fff7ed" color="#fdba74"
    node [fillcolor="#fed7aa" width=0.82]
    h21 [label="h₁²\nReLU\na²₁"] h22 [label="h₂²\nReLU\na²₂"] h23 [label="h₃²\nReLU\na²₃"]
  }
  subgraph cluster_out {
    label="Output Layer\na³ = W³h²+b³\nŷ = σ(a³)" fontname="monospace" fontsize=9
    style=filled fillcolor="#fdf4ff" color="#d8b4fe"
    node [fillcolor="#e9d5ff" width=0.85]
    y1 [label="ŷ₁\nσ(a³₁)\np(c₁)"] y2 [label="ŷ₂\nσ(a³₂)\np(c₂)"]
  }

  node [shape=rect style="filled,rounded" fillcolor="#fee2e2" width=1.7 height=1.0]
  L [label="Loss  L\nCross-Entropy:\nL = −Σ yᵢ log ŷᵢ\n∂L/∂ŷ = ŷ − y"]

  edge [color="#1d4ed8" penwidth=1.3 style=solid arrowhead=vee]
  x1->h11 [label="W¹₁₁"] x1->h12 [label="W¹₁₂"] x1->h13 x1->h14
  x2->h11 [label="W¹₂₁"] x2->h12 x2->h13 x2->h14
  x3->h11 x3->h12 x3->h13 [label="W¹₃₃"] x3->h14
  h11->h21 [label="W²₁₁"] h11->h22 h11->h23
  h12->h21 h12->h22 [label="W²₂₂"] h12->h23
  h13->h21 h13->h22 h13->h23 [label="W²₃₃"]
  h14->h21 h14->h22 h14->h23
  h21->y1 [label="W³₁₁"] h21->y2
  h22->y1 h22->y2 [label="W³₂₂"]
  h23->y1 h23->y2
  y1->L y2->L

  edge [color="#dc2626" penwidth=1.8 style=dashed arrowhead=vee constraint=false fontcolor="#dc2626"]
  L->y1 [label="∂L/∂ŷ₁\n=ŷ₁−y₁"] L->y2 [label="∂L/∂ŷ₂\n=ŷ₂−y₂"]
  y1->h21 [label="∂L/∂a³₁"] y1->h22 y1->h23
  y2->h21 y2->h22 [label="∂L/∂a³₂"] y2->h23
  h21->h11 [label="∂L/∂h¹₁"] h21->h12 h21->h13 h21->h14
  h22->h11 h22->h12 [label="∂L/∂h¹₂"] h22->h13 h22->h14
  h23->h11 h23->h12 h23->h13 [label="∂L/∂h¹₃"] h23->h14
  h11->x1 [label="∂L/∂W¹₁₁"] h12->x2 [label="∂L/∂W¹₂₂"]
  h13->x3 [label="∂L/∂W¹₃₃"] h14->x1 [label="∂L/∂W¹·₄"]

  node [shape=rect style="filled,rounded" fillcolor="#fef9c3" fontsize=8 width=1.8 height=0.9 color="#ca8a04"]
  gd [label="Gradient Descent:\nW ← W − η ∂L/∂W\nη = learning rate"]
  node [shape=rect style="filled,rounded" fillcolor="#f0f9ff" fontsize=8 width=1.9 height=0.9 color="#0284c7"]
  act [label="ReLU: f(a)=max(0,a)\nf'(a)=1 if a>0 else 0\nσ: f(a)=1/(1+e^-a)"]

  gd -> L [style=dotted color="#ca8a04" arrowhead=none]
  act -> h11 [style=dotted color="#0284c7" arrowhead=none]
  { rank=same; gd; L }
  { rank=same; act; h11 }
}
Backpropagation — Graphviz
Backpropagation through a 3→4→3→2 MLP, Graphviz dot. Forward blue, backward red dashed.

This one required the most iteration on layout. With constraint=false on the backward edges, the engine tries to route them without affecting the forward-pass layout, but on a dense fully-connected graph the result is still visually noisy. The key gradient labels (∂L/∂W¹₁₁ etc.) are annotated on representative edges only, not all of them.


Hidden Markov Model structure

The HMM diagram is the model itself, not the algorithm run on it. Hidden states with transition probabilities (including self-loops), emission probabilities to observation nodes, and initial distribution.

digraph HMM {
  rankdir=TB
  fontname="monospace"
  fontsize=13
  nodesep=1.2
  ranksep=1.4
  label="Hidden Markov Model — Ice Cream Weather Example"
  labelloc=t

  node [fontname="monospace" fontsize=12]
  edge [fontname="monospace" fontsize=10]

  node [shape=diamond style=filled fillcolor="#f3e5f5" width=0.7]
  pi [label="π\nStart"]

  node [shape=circle style=filled fillcolor="#bbdefb" width=1.1]
  H [label="HOT\nπ=0.8"] C [label="COLD\nπ=0.2"] W [label="WARM\nπ=0.0"]

  node [shape=rect style="filled,rounded" fillcolor="#fff9c4" width=1.2]
  O1 [label="1 ice cream\nP(1|H)=0.2\nP(1|C)=0.5\nP(1|W)=0.4"]
  O2 [label="2 ice creams\nP(2|H)=0.4\nP(2|C)=0.4\nP(2|W)=0.4"]
  O3 [label="3 ice creams\nP(3|H)=0.4\nP(3|C)=0.1\nP(3|W)=0.2"]

  edge [color="#9c27b0" penwidth=1.5 fontcolor="#9c27b0"]
  pi -> H [label="0.8"] pi -> C [label="0.2"] pi -> W [label="0.0"]

  edge [color="#1565c0" penwidth=1.5 fontcolor="#1565c0"]
  H -> H [label="0.7"] H -> C [label="0.1"] H -> W [label="0.2"]
  C -> C [label="0.5"] C -> H [label="0.1"] C -> W [label="0.4"]
  W -> W [label="0.3"] W -> H [label="0.4"] W -> C [label="0.3"]

  edge [color="#e65100" style=dashed penwidth=1.2 fontcolor="#e65100"]
  H -> O1 [label="0.2"] H -> O2 [label="0.4"] H -> O3 [label="0.4"]
  C -> O1 [label="0.5"] C -> O2 [label="0.4"] C -> O3 [label="0.1"]
  W -> O1 [label="0.4"] W -> O2 [label="0.4"] W -> O3 [label="0.2"]

  { rank=same; H; C; W }
  { rank=same; O1; O2; O3 }
}
HMM — Graphviz
HMM structure, Graphviz dot. Transition edges blue, emission edges orange dashed.

The self-loops (H→H, C→C, W→W) render cleanly in Graphviz. Most other tools handle self-loops poorly or not at all. This is one area where DOT’s maturity shows.


ML training pipeline

For an end-to-end pipeline with a feedback loop, D2 with ELK produces a cleaner result than Graphviz. The retrain trigger edge from monitoring back to ingestion crosses several other elements, and ELK routes it without collisions.

title: ML Training Pipeline {
  near: top-center
  shape: text
  style.font-size: 20
}

direction: right

raw_data: Raw Data {
  shape: cylinder
  style.fill: "#e3f2fd"
  label: "Raw Data\nCSV / JSON / Parquet"
}

ingestion: Data Ingestion {
  shape: rectangle
  style.fill: "#bbdefb"
  label: "Data Ingestion\n• schema validation\n• type coercion\n• dedup"
}

split: Train/Val/Test Split {
  shape: rectangle
  style.fill: "#c8e6c9"
  label: "Split\n70% train\n15% val\n15% test"
}

preprocessing: Feature Engineering {
  shape: rectangle
  style.fill: "#fff9c4"
  label: "Feature Engineering\n• normalisation\n• one-hot encoding\n• imputation\n• PCA"
}

training: Model Training {
  shape: rectangle
  style.fill: "#ffe0b2"
  label: "Training Loop\n• forward pass\n• loss compute\n• backprop\n• optimiser step"

  epochs: Epochs {
    shape: rectangle
    style.fill: "#ffcc80"
    label: "N epochs\nbatch_size=32\nlr=1e-3"
  }

  checkpoints: Checkpoints {
    shape: rectangle
    style.fill: "#ffcc80"
    label: "Checkpoint\nbest val loss\nearly stopping"
  }

  epochs -> checkpoints: "save if improved"
}

evaluation: Evaluation {
  shape: rectangle
  style.fill: "#e8eaf6"
  label: "Evaluation\n• accuracy\n• precision/recall\n• F1, AUC-ROC\n• confusion matrix"
}

registry: Model Registry {
  shape: cylinder
  style.fill: "#f3e5f5"
  label: "Model Registry\nversioned artifacts\nmetadata + metrics"
}

serving: Serving {
  shape: rectangle
  style.fill: "#fce4ec"
  label: "Inference API\nREST / gRPC\nbatching\nlatency SLA"
}

monitoring: Monitoring {
  shape: rectangle
  style.fill: "#efebe9"
  label: "Monitoring\n• data drift\n• prediction drift\n• latency\n• error rate"
}

raw_data -> ingestion: "load"
ingestion -> split: "clean data"
split -> preprocessing: "train set\nval set\ntest set"
preprocessing -> training: "feature matrix X\nlabels y"
training -> evaluation: "trained model"
evaluation -> registry: "passes threshold?" {
  style.stroke-dash: 4
}
registry -> serving: "deploy"
serving -> monitoring: "live traffic"
monitoring -> ingestion: "retrain trigger" {
  style.stroke: "#e53935"
  style.stroke-dash: 4
}
ML training pipeline — D2
ML training pipeline, D2 with ELK layout. Retrain feedback loop in red.

The nested training container with epochs and checkpoints inside it renders cleanly in D2. Graphviz could do this with subgraphs but the ELK routing handles the backward retrain edge better.


Transformer architecture

The transformer encoder-decoder has enough nested structure (six-layer encoder, six-layer decoder, cross-attention connecting them) that D2’s container model is a natural fit.

title: Transformer Architecture — Encoder-Decoder {
  near: top-center
  shape: text
  style.font-size: 20
}

direction: right

input_tokens: Input Tokens {
  shape: rectangle
  style.fill: "#e3f2fd"
  label: "Input Tokens\n[BOS, x₁, x₂, ..., xₙ, EOS]"
}

encoder: Encoder {
  style.fill: "#e8f5e9"
  style.stroke: "#388e3c"

  embed: Input Embedding {
    shape: rectangle
    style.fill: "#c8e6c9"
    label: "Token Embedding\nd_model=512"
  }

  pos_enc: Positional Encoding {
    shape: rectangle
    style.fill: "#c8e6c9"
    label: "Positional Encoding\nsin/cos, max_len=512"
  }

  layer1: Encoder Layer × 6 {
    style.fill: "#a5d6a7"

    mha: Multi-Head Attention {
      shape: rectangle
      style.fill: "#81c784"
      label: "Multi-Head Self-Attention\nh=8 heads, d_k=64\nQ, K, V projections"
    }
    add_norm1: Add & Norm {
      shape: rectangle
      style.fill: "#e8f5e9"
      label: "Add & LayerNorm\nresidual connection"
    }
    ffn: Feed-Forward {
      shape: rectangle
      style.fill: "#81c784"
      label: "FFN\nd_ff=2048\nReLU(W₁x+b₁)W₂+b₂"
    }
    add_norm2: Add & Norm {
      shape: rectangle
      style.fill: "#e8f5e9"
      label: "Add & LayerNorm\nresidual connection"
    }

    mha -> add_norm1
    add_norm1 -> ffn
    ffn -> add_norm2
  }

  embed -> pos_enc
  pos_enc -> layer1
}

decoder: Decoder {
  style.fill: "#fff3e0"
  style.stroke: "#e65100"

  embed: Output Embedding {
    shape: rectangle
    style.fill: "#ffe0b2"
    label: "Token Embedding\n(shifted right)"
  }
  pos_enc: Positional Encoding {
    shape: rectangle
    style.fill: "#ffe0b2"
    label: "Positional Encoding"
  }

  layer1: Decoder Layer × 6 {
    style.fill: "#ffcc80"

    masked_mha: Masked Multi-Head Attention {
      shape: rectangle
      style.fill: "#ffa726"
      label: "Masked Self-Attention\ncausal mask\nh=8 heads"
    }
    add_norm1: Add & Norm {
      shape: rectangle
      style.fill: "#fff3e0"
      label: "Add & LayerNorm"
    }
    cross_mha: Cross-Attention {
      shape: rectangle
      style.fill: "#ffa726"
      label: "Cross-Attention\nQ from decoder\nK,V from encoder"
    }
    add_norm2: Add & Norm {
      shape: rectangle
      style.fill: "#fff3e0"
      label: "Add & LayerNorm"
    }
    ffn: Feed-Forward {
      shape: rectangle
      style.fill: "#ffa726"
      label: "FFN\nd_ff=2048"
    }
    add_norm3: Add & Norm {
      shape: rectangle
      style.fill: "#fff3e0"
      label: "Add & LayerNorm"
    }

    masked_mha -> add_norm1
    add_norm1 -> cross_mha
    cross_mha -> add_norm2
    add_norm2 -> ffn
    ffn -> add_norm3
  }

  embed -> pos_enc
  pos_enc -> layer1
}

linear: Linear + Softmax {
  shape: rectangle
  style.fill: "#fce4ec"
  label: "Linear\nVocab projection\nd_model → |V|\n+ Softmax"
}

output_tokens: Output Tokens {
  shape: rectangle
  style.fill: "#f3e5f5"
  label: "Output Probabilities\nP(token | context)"
}

input_tokens -> encoder
encoder -> decoder: "encoder output\nK, V keys/values"
decoder -> linear
linear -> output_tokens
Transformer — D2
Transformer encoder-decoder, D2 with ELK. Encoder K,V feed into decoder cross-attention.

Microservices architecture

The same architecture drawn in both D2 and Mermaid. D2 with ELK routes the dense cross-layer edges more cleanly. Mermaid’s dagre lays it out more compactly but the edge routing gets crowded in the middle.

title: ML Platform — Microservices Architecture {
  near: top-center
  shape: text
  style.font-size: 20
}

direction: down

client: Client Layer {
  style.fill: "#e3f2fd"
  style.stroke: "#1565c0"
  web: Web App { shape: rectangle; style.fill: "#bbdefb"; label: "Web App\nReact / Next.js" }
  mobile: Mobile App { shape: rectangle; style.fill: "#bbdefb"; label: "Mobile App\niOS / Android" }
  sdk: Python SDK { shape: rectangle; style.fill: "#bbdefb"; label: "Python SDK\napi client" }
}

gateway: API Gateway {
  shape: rectangle
  style.fill: "#fff9c4"
  style.stroke: "#f9a825"
  label: "API Gateway\nauth, rate limiting\nrouting, logging\nnginx / Kong"
}

services: Core Services {
  style.fill: "#e8f5e9"
  style.stroke: "#388e3c"
  auth: Auth Service { shape: rectangle; style.fill: "#c8e6c9"; label: "Auth Service\nJWT / OAuth2\nuser management" }
  experiment: Experiment Service { shape: rectangle; style.fill: "#c8e6c9"; label: "Experiment Service\nhyperparameter search\nrun tracking\nMLflow" }
  training: Training Service { shape: rectangle; style.fill: "#c8e6c9"; label: "Training Service\njob scheduling\nGPU allocation\nk8s jobs" }
  inference: Inference Service { shape: rectangle; style.fill: "#c8e6c9"; label: "Inference Service\nmodel serving\nbatch + realtime\ntriton" }
  feature: Feature Store { shape: rectangle; style.fill: "#c8e6c9"; label: "Feature Store\nFeast / Tecton\nonline + offline" }
}

data: Data Layer {
  style.fill: "#fff3e0"
  style.stroke: "#e65100"
  postgres: PostgreSQL { shape: cylinder; style.fill: "#ffe0b2"; label: "PostgreSQL\nmetadata\nusers, runs" }
  s3: Object Storage { shape: cylinder; style.fill: "#ffe0b2"; label: "S3 / GCS\nmodels, datasets\nartefacts" }
  redis: Redis { shape: cylinder; style.fill: "#ffe0b2"; label: "Redis\nfeature cache\nsession store" }
  kafka: Kafka { shape: queue; style.fill: "#ffe0b2"; label: "Kafka\nevent streaming\nprediction logs" }
}

monitoring: Observability {
  style.fill: "#fce4ec"
  style.stroke: "#c62828"
  prometheus: Prometheus { shape: rectangle; style.fill: "#f8bbd0"; label: "Prometheus\nmetrics scraping" }
  grafana: Grafana { shape: rectangle; style.fill: "#f8bbd0"; label: "Grafana\ndashboards\nalerting" }
  drift: Drift Monitor { shape: rectangle; style.fill: "#f8bbd0"; label: "Drift Monitor\nEvidentlyAI\ndata + model drift" }
}

client.web -> gateway: "HTTPS"
client.mobile -> gateway: "HTTPS"
client.sdk -> gateway: "HTTPS"
gateway -> services.auth: "authenticate"
gateway -> services.experiment: "REST"
gateway -> services.training: "REST"
gateway -> services.inference: "REST / gRPC"
services.training -> data.s3: "save artefacts"
services.training -> data.postgres: "log runs"
services.training -> data.kafka: "publish events"
services.inference -> data.redis: "feature lookup"
services.inference -> data.kafka: "log predictions"
services.feature -> data.redis: "populate cache"
services.feature -> data.s3: "offline features"
data.kafka -> monitoring.drift: "stream"
services.inference -> monitoring.prometheus: "metrics"
services.training -> monitoring.prometheus: "metrics"
monitoring.prometheus -> monitoring.grafana
Microservices — D2
ML platform microservices, D2 with ELK.
---
title: "ML Platform — Microservices Architecture"
---
flowchart TD
    classDef client  fill:#bbdefb,stroke:#1565c0,color:#0d2a6e,font-size:12px
    classDef gateway fill:#fff9c4,stroke:#f9a825,color:#5d4037,font-size:12px
    classDef service fill:#c8e6c9,stroke:#388e3c,color:#1b5e20,font-size:12px
    classDef data    fill:#ffe0b2,stroke:#e65100,color:#4e2a00,font-size:12px
    classDef monitor fill:#f8bbd0,stroke:#c62828,color:#4a0000,font-size:12px
    classDef clusterClient  fill:#e3f2fd,stroke:#1565c0
    classDef clusterSvc     fill:#e8f5e9,stroke:#388e3c
    classDef clusterData    fill:#fff3e0,stroke:#e65100
    classDef clusterMonitor fill:#fce4ec,stroke:#c62828

    subgraph CLIENT["Client Layer"]
        WEB["**Web App**<br/>React / Next.js"]:::client
        MOB["**Mobile App**<br/>iOS / Android"]:::client
        SDK["**Python SDK**<br/>api client"]:::client
    end

    GW["**API Gateway**<br/>auth · rate limiting<br/>routing · logging<br/>nginx / Kong"]:::gateway

    subgraph SERVICES["Core Services"]
        AUTH["**Auth Service**<br/>JWT / OAuth2<br/>user management"]:::service
        EXP["**Experiment Service**<br/>hyperparameter search<br/>run tracking · MLflow"]:::service
        TRAIN["**Training Service**<br/>job scheduling<br/>GPU allocation · k8s jobs"]:::service
        INF["**Inference Service**<br/>model serving<br/>batch + realtime · triton"]:::service
        FEAT["**Feature Store**<br/>Feast / Tecton<br/>online + offline"]:::service
    end

    subgraph DATA["Data Layer"]
        PG[("**PostgreSQL**<br/>metadata<br/>users · runs")]:::data
        S3[("**S3 / GCS**<br/>models · datasets<br/>artefacts")]:::data
        REDIS[("**Redis**<br/>feature cache<br/>session store")]:::data
        KAFKA[/"**Kafka**<br/>event streaming<br/>prediction logs"\]:::data
    end

    subgraph OBS["Observability"]
        PROM["**Prometheus**<br/>metrics scraping"]:::monitor
        GRAF["**Grafana**<br/>dashboards · alerting"]:::monitor
        DRIFT["**Drift Monitor**<br/>EvidentlyAI<br/>data + model drift"]:::monitor
    end

    WEB -->|HTTPS| GW
    MOB -->|HTTPS| GW
    SDK -->|HTTPS| GW
    GW -->|authenticate| AUTH
    GW -->|REST| EXP
    GW -->|REST| TRAIN
    GW -->|REST / gRPC| INF
    TRAIN -->|save artefacts| S3
    TRAIN -->|log runs| PG
    TRAIN -->|publish events| KAFKA
    INF -->|feature lookup| REDIS
    INF -->|log predictions| KAFKA
    FEAT -->|populate cache| REDIS
    FEAT -->|offline features| S3
    KAFKA -->|stream| DRIFT
    INF -->|metrics| PROM
    TRAIN -->|metrics| PROM
    PROM --> GRAF

    class CLIENT clusterClient
    class SERVICES clusterSvc
    class DATA clusterData
    class OBS clusterMonitor
Microservices — Mermaid
Same architecture, Mermaid with dagre. More compact but edge routing is denser.

Mermaid won on this particular diagram for use in a blog post. The output is more compact and fits the page width better. D2’s ELK layout produces a taller diagram that requires more scrolling.


ML training loop sequence

Sequence diagrams with loop and alt blocks are Mermaid’s strongest feature. The D2 version of the same diagram annotates the loop semantics on individual messages instead, because those constructs do not exist in D2’s sequence syntax.

sequenceDiagram
    participant U as User / Researcher
    participant T as Trainer Process
    participant DS as DataLoader
    participant M as Model
    participant O as Optimiser
    participant V as Validator
    participant R as Registry

    U->>T: train(config, dataset)
    T->>DS: build DataLoader(batch_size=32, shuffle=True)
    DS-->>T: batched iterator

    loop Every Epoch
        T->>DS: next batch (X, y)
        DS-->>T: X ∈ ℝ^(32×d), y ∈ ℝ^32
        T->>M: forward(X)
        M-->>T: ŷ = σ(Wx + b)
        T->>T: loss = CrossEntropy(ŷ, y)
        Note over T: L = −Σ yᵢ log(ŷᵢ)
        T->>O: zero_grad()
        T->>T: loss.backward()
        Note over T,M: Compute ∂L/∂W via backprop
        T->>O: step()  [W ← W − η∇L]
        O-->>T: updated weights
        T->>T: clip_grad_norm_(max=1.0)
    end

    T->>V: evaluate(val_loader)
    V->>M: forward(X_val) for all batches
    M-->>V: predictions ŷ_val
    V-->>T: val_loss, accuracy, F1

    alt val_loss improved
        T->>R: save_checkpoint(model, epoch, metrics)
        R-->>T: checkpoint path
    else no improvement for patience=5
        T->>T: early_stop()
        T-->>U: best model at epoch k
    end

    T->>R: register_model(name, version, metrics)
    R-->>U: model URI
ML training loop — Mermaid
Training loop sequence diagram, Mermaid. loop and alt are native constructs.

Gradient descent flowchart

flowchart TD
    A([Start]) --> B[Initialize weights W\nrandomly or zeros]
    B --> C[Compute forward pass\nŷ = f&#40;X, W&#41;]
    C --> D[Compute loss\nL = ½‖ŷ − y‖²]
    D --> E[Compute gradients\n∂L/∂W via backprop]
    E --> F{Gradient\ncheck OK?}
    F -- No --> G[Debug: check\nnumerical gradient]
    G --> C
    F -- Yes --> H[Update weights\nW ← W − η·∂L/∂W]
    H --> I{Scheduler\ntype?}
    I -- StepLR --> J[η ← η · γ\nevery k steps]
    I -- CosineAnnealing --> K[η ← ηₘᵢₙ + ½&#40;ηₘₐₓ−ηₘᵢₙ&#41;\n·&#40;1+cos&#40;πt/T&#41;&#41;]
    I -- ReduceOnPlateau --> L[Monitor val loss\nreduce if stagnant]
    J --> M[Increment step\nt ← t + 1]
    K --> M
    L --> M
    M --> N{Convergence\ncriteria?}
    N -- ‖∇L‖ < ε --> O[Check val loss\nfor overfitting]
    N -- max epochs --> O
    N -- No --> C
    O --> P{Val loss\nstill falling?}
    P -- Yes --> C
    P -- No\nearly stop --> Q([Return best W])

    style A fill:#dbeafe,stroke:#1d4ed8
    style Q fill:#dcfce7,stroke:#15803d
    style G fill:#fef9c3,stroke:#ca8a04
    style D fill:#fce7f3,stroke:#be185d
    style E fill:#fce7f3,stroke:#be185d
    style H fill:#ede9fe,stroke:#7c3aed
Gradient descent — Mermaid
Gradient descent with scheduler branching and convergence check, Mermaid.

Kalman filter

The Kalman predict-update cycle is a two-box diagram with input arrows and a feedback loop. Pikchr handles it cleanly with named box anchors and the then left until even with arrow routing syntax.

# Kalman Filter — Predict/Update Cycle

scale = 1.0

B1: box rad 0.12 wid 2.8 ht 1.6 fill 0xdbeafe \
    "PREDICT" bold big \
    "x' = F*x + B*u" small \
    "P' = F*P*F' + Q" small \
    at 1.4,2.5

B2: box rad 0.12 wid 2.8 ht 1.6 fill 0xdcfce7 \
    "UPDATE" bold big \
    "K = P'*H'*inv(H*P'*H'+R)" small \
    "x = x' + K*(z - H*x')" small \
    "P = (I - K*H)*P'" small \
    at 5.0,2.5

arrow from B1.e to B2.w "x', P'" above small

arrow from B2.s down 0.5 then left until even with B1.s then to B1.s \
    "updated x, P" below small

arrow from B1.n+(-.5,0) up 0.5 "u (control)" above small
arrow from B1.n+(.5,0) up 0.5 "Q (process noise)" above small

arrow from B2.n+(-.5,0) up 0.5 "z (measurement)" above small
arrow from B2.n+(.5,0) up 0.5 "R (meas. noise)" above small

box wid 5.0 ht 0.6 rad 0.08 fill 0xfef9c3 \
    "F: transition   H: observation   K: Kalman gain   Q,R: noise covariances" small \
    at 3.2,0.35

text "Kalman Filter: Predict / Update Cycle" bold with .s at B1.nw+(1.4,0.5)
Kalman filter — Pikchr
Kalman filter predict/update cycle, Pikchr. Feedback arrow routed below both boxes.

Decision tree

# Decision Tree — Loan Approval (depth=3)

scale = 1.0

R: box rad 0.1 wid 1.9 ht 0.65 fill 0xdbeafe \
   "Age <= 30?" bold "Gini=0.48  n=200" small \
   at 3.0,5.2

L1: box rad 0.1 wid 1.8 ht 0.65 fill 0xe0f2fe \
    "Income <= 50k?" bold "Gini=0.42  n=110" small \
    at 1.3,3.8

R1: box rad 0.1 wid 1.7 ht 0.65 fill 0xe0f2fe \
    "Credit > 700?" bold "Gini=0.30  n=90" small \
    at 4.7,3.8

LL: box rad 0.1 wid 1.4 ht 0.6 fill 0xfce4ec \
    "DENY" bold "n=68  p=0.82" small \
    at 0.4,2.4

LR: box rad 0.1 wid 1.4 ht 0.6 fill 0xf0fdf4 \
    "APPROVE" bold "n=42  p=0.76" small \
    at 2.2,2.4

RL: box rad 0.1 wid 1.5 ht 0.6 fill 0xfef9c3 \
    "Yrs Emp?" bold "Gini=0.22  n=55" small \
    at 3.8,2.4

RR: box rad 0.1 wid 1.4 ht 0.6 fill 0xf0fdf4 \
    "APPROVE" bold "n=35  p=0.91" small \
    at 5.5,2.4

RLL: box rad 0.1 wid 1.35 ht 0.55 fill 0xfce4ec \
     "DENY" bold "n=28  p=0.79" small \
     at 3.1,1.0

RLR: box rad 0.1 wid 1.35 ht 0.55 fill 0xf0fdf4 \
     "APPROVE" bold "n=27  p=0.85" small \
     at 4.7,1.0

arrow from R.sw to L1.n
arrow from R.se to R1.n
arrow from L1.sw to LL.n
arrow from L1.se to LR.n
arrow from R1.sw to RL.n
arrow from R1.se to RR.n
arrow from RL.sw to RLL.n
arrow from RL.se to RLR.n

text "Yes" small with .e at 0.5 way between R.sw and L1.n
text "No"  small with .w at 0.5 way between R.se and R1.n
text "Yes" small with .e at 0.5 way between L1.sw and LL.n
text "No"  small with .w at 0.5 way between L1.se and LR.n
text "No"  small with .e at 0.5 way between R1.sw and RL.n
text "Yes" small with .w at 0.5 way between R1.se and RR.n
text "<2y" small with .e at 0.5 way between RL.sw and RLL.n
text ">=2y" small with .w at 0.5 way between RL.se and RLR.n

text "Decision Tree: Loan Approval (depth=3)" bold with .s at R.n+(0,0.4)
Decision tree — Pikchr
Loan approval decision tree depth 3, Pikchr. Named node anchors used for branch label placement.

What held up and what did not

Claude generated syntactically correct code for all four tools without exception. The accuracy of the mathematical content (parameter counts, probability values, gradient expressions) was consistently correct and would have taken significant time to write by hand.

The failure modes were layout-related, not content-related. Pikchr requires knowing the exact coordinates you want before you start. Claude’s initial coordinate estimates for the SVM scatter plot put the hyperplane at the wrong angle and the support vectors too close to the margin. Fixing it meant specifying the endpoint coordinates explicitly and iterating.

For graph tools, Claude sometimes generated too many label edges on dense graphs, making the result unreadable. The solution was to label only representative edges, not every one.

The Pikchr backprop diagram does not exist in this set. Pikchr can draw it (circles at coordinates, arrows between them) but the result would be manually placing every neuron and every edge. For a fully-connected network that is around 150 arrows. Claude can generate that but it is not a good use of Pikchr. The Graphviz version is better for that specific diagram.

← All posts