crumb#008: Gradient Descent Was Nature’s Idea

Biology has been optimising shapes, patterns, and behaviour long before Python got involved.

Jul 05, 2025

I am a Machine Learning Engineer- which is a fancy way of saying I use libraries that researchers and developers alike have spent years to build and perfect. The objective is simple: preprocess the dataset, finalise the objective function, train your model for numerous epochs followed by subsequent rounds of hyperparameter optimisation, till you have a model that gives you a high score on the required metric with statistical significance- which again, is just a lot of fancy jargon.

In practice, this often means writing some code, letting the algorithm churn, and ending up with a model that performs just a bit better than random guessing- after spending thousands of dollars on compute- before pitching it to my manager as the next big breakthrough since the invention of the wheel.

The “train your model for numerous epochs“ is the part where the real magic happens. That’s when your model gets better and better at making predictions. And for the purpose of this article, let’s try to understand how.

Let’s take a simple example. Imagine you're trying to guess whether a child will buy an ice cream on any given day. You have one clue: the temperature.

So you write a tiny equation, like in school:

y = mx + c

Here,

x is the temperature (say, 30°C),
y is the prediction (will the child buy ice cream? Yes or no),
m and c are just two numbers we don’t know yet- kind of like the model’s “guessing knobs.”

At first, your model picks random values for m and c. It might guess terribly- like saying a kid would buy ice cream in the middle of a snowstorm. That’s where something called loss comes in.

Think of loss as a “how wrong was I?” meter. If the prediction is way off, the loss is big. If it’s close, the loss is small. Your model sees the loss and thinks, Hmm, that was bad. Let me try adjusting my numbers a bit.

So how does it know which way to adjust? That’s gradient descent- the optimisation algorithm.

Gradient descent is like playing a game of “hot or cold” to find the best answer. Your model checks which direction makes the loss smaller- left or right, up or down- and takes a tiny step in that direction. Then it does it again. And again. Step by step, getting closer to the best m and c.

Imagine being blindfolded on a hill, trying to reach the bottom. You feel the slope under your feet, and always take a small step downhill. That’s gradient descent. It doesn’t guess. It gently walks downhill, using feedback to guide each step.

After enough steps- often thousands or millions- the model learns the best way to predict whether a child wants ice cream on a hot day. It’s like learning by making mistakes, then correcting them a little at a time.

And that’s what all machine learning models are doing behind the scenes. Whether it’s recognising a cat, recommending a movie, or translating a sentence or now, becoming so ubiquitous that it’s hard to decipher what’s machine written, and what’s not- they’re just using gradient descent to reduce their mistakes and get smarter.

And don’t worry- that’s as deep as we’ll go into machine learning today. I won’t drag you through layers of neural networks or backpropagation. But why explain all this at all?

Because while gradient descent and machine learning might feel cutting-edge to us humans, nature has been running its own optimisation loops for millions of years. Long before GPUs and Python libraries, evolution was out there, quietly tuning parameters through trial and error- shaping everything from leaf patterns to animal behaviour- not in milliseconds, but over millennia.

Take the humble honeycomb. The weird little structure that triggers trypophobia, and stores honey.

Have you ever wondered why is that shape a hexagon? Why not circles, triangles, or some squiggly shape? Because over countless generations, bees- through nothing but instinct and inherited behaviour- arrived at a design that mathematicians would only later prove to be optimal.

The hexagon, it turns out, is the most efficient shape to tile a flat surface: it covers the maximum area while using the least amount of building material. For a bee, that means more honey stored, less wax spent.

It’s nature doing cost-optimisation without ever calling it that.

Biology beat geometry to the punch- a result evolution reached not by genius, but by persistence. Millions of tiny adjustments across time. Nature was doing gradient descent long before we gave it a name. (Although yes, imagining bees solving linear programming problems is admittedly more entertaining. But alas, we aren’t that lucky.)

And that, kind of, is the crux of this article- how nature has mastered the art of optimisation in shapes, colours, sizes, and everything else. Biology students, now is your time to be proud- or maybe not.

The thing is, there is math in everything that biology does. Mini optimisation algorithms working non-stop, generation to generation, resulting in the shapes, colours, positions, and sizes we see on display today.

I remember in my last semester in college, I had taken up Non-Linear Optimisation as one of the electives. For the final assignment we needed to implement, from scratch, a few optimisation algorithms, one of which was based on something called the golden ratio. In a world without ChatGPT when you actually needed to code, that was a drudgery.

(Thank you stack overflow.)

The golden ratio- roughly 1.618- is one of those weird numbers that keeps showing up everywhere. But what is it, exactly?

Technically, it's a proportion: if you divide a line into two parts, a smaller part and a larger part, such that the ratio of the smaller part to the larger part is the same as the ratio of the larger part to the whole- that ratio is the golden ratio. It’s irrational (in the mathematical sense at least), which just means you can’t write it as a neat fraction, and its decimal expansion goes on forever without repeating.

Sounds abstract, but it shows up in real places. A lot. In architecture, photography, music, design- in the layout of your credit card. It pops up in Beethoven’s symphonies, the spiral of a seashell, the structure of pinecones, and even your Instagram feed if you are into photography composition tips (although they never seem to help me). It’s supposedly in the proportions of the Mona Lisa as well.

But here’s the point: this isn’t just some magic math number. The golden ratio often represents an efficient way to arrange or scale things. And nature, being the master of efficiency, seems to love it.

Take the way plants arrange their leaves, seeds, or petals- what scientists call phyllotaxis (a sad PCMB student like me will tell you). The angles involved aren’t random. They’re tuned over millions of years to maximise sunlight exposure and minimise overlap. And yep- those angles often trace back to the golden ratio, or more precisely, to its spin-off: the golden angle- the angle you get when you divide a circle such that the ratio of the smaller arc to the larger is the same as the ratio of the larger arc to the full circle.

Think of it as nature running a very slow, very patient version of gradient descent. Over millions of years, plants “learned” that if each new leaf grew at an angle of ~137.5° from the last- the golden angle- it would avoid blocking the ones below it. Better sunlight, better photosynthesis, better survival. Those that grew at less optimal angles didn’t do as well- maybe a plant started with 90°, but whoops, after just four leaves you’d start getting an overlap. And over generations, nature tuned this parameter just like we tweak weights in a model.

While we update our models over epochs using gradient descent, nature's been doing the same thing for eons- just with DNA instead of code.

I don’t know about you, but watching nature documentaries is one of my favourite ways to spend time. There’s something enchanting in the way Sir David Attenborough narrates, in his warm, husky voice, how a waddle of penguin needs to stay warm: “In brutal -40°C temperatures and wind speeds over 100 km/h, staying warm isn’t a luxury, it’s survival.”

So what do they do? They huddle.

And while eating dinner as you watch the penguins huddle, you might not make much of it and find it rather cute, their huddling together is one of the most committed group projects.

This isn’t some chaotic pile of feathers. Their huddle is a mathematical marvel. Penguins pack themselves in tightly, minimising surface area exposed to the wind while maximising shared warmth- like a biological algorithm solving a dynamic optimisation problem.

Even more fascinating? The huddle moves. Penguins on the outer edges slowly shift inward, while the inner ones get pushed out, forming a gentle wave-like motion every 30 to 60 seconds. It’s not random- it’s regulated, coordinated, and efficient.

Each penguin takes tiny, incremental steps, guided by local cues, adjusting its position in response to the cold- not too different from how a machine learning model takes small steps during gradient descent to reduce error. They're not consciously calculating, but the result is an emergent solution that benefits the whole system. Evolution taught them to keep warm not by standing still, but by moving smart.

You can check out a small clip by BBC Earth here.

So, laugh as much as you want at the funny way penguins walk, that walk is an evolutionary masterpiece.

What do you have, huh?

In Machine Learning, one of the more advanced topics is online-learning. Basically, learning on the go instead of first training and then changing recommendations based on that.

For example, you talk to your friend on WhatsApp how you need to wash your sneakers, and the moment you open Instagram you have an advertisement of a startup that does exactly that. The model doesn’t take a day to learn your preferences, it is almost instant. These are systems that don’t just learn once, but keep learning as they go. That’s not evolution- that’s adaptation.

Whatever happened to end-to-end encryption, but you get what I mean.

And just when you think that that might be something that nature can't do- because as we have discussed, evolution takes millennia to get things right- you’re very wrong.

Not every optimisation story in nature plays out over thousands of years. Obviously learning that art takes millennia, but for this part we’re more concerned about implementation.

Sometimes, nature also learns right now.

Take the cuttlefish- a creature that, at first glance, doesn’t scream “engineering marvel.” Soft-bodied, boneless, kind of like a squid with flair. But under that squishy surface lies a biological system that’s more advanced than anything we can comprehend.

Cuttlefish are masters of disguise. Not in the “give it a few minutes” kind of way- in milliseconds. They can change the colour, contrast, and even texture of their skin to blend perfectly into their environment. A pebbly seafloor, a coral reef, the shadow under a rock- it doesn’t matter. The camouflage is instant, precise, and often impossible to detect with the naked eye.

So, how does it work?

One Click = You Recommend

Enjoying these curious crumbs?
If today’s story left you smiling or wondering, just tap to recommend it to a fellow explorer. Every crumb you share helps our little trail grow!

Their skin contains thousands of specialised pigment cells called (and pardon the heavy biology terms) chromatophores, which expand or contract to show different colours. Below those are iridophores and leucophores, which reflect and scatter light.

Together, they form a real-time display system- think of it as millions of tiny biological pixels controlled by the cuttlefish’s nervous system. A living, shape-changing LED display.

But here's the wild part: it’s all based on feedback.

The cuttlefish doesn’t hardcode its response. It takes in visual data from the environment, processes it, and adjusts its skin accordingly- in real time. It’s not evolving a better colour pattern over generations; it’s actively sensing and reacting. This is nature with its own version of online learning- constantly updating based on new data, without needing to retrain from scratch. And to be honest, nature’s version far outdoes anything that we have done, or arguably, will ever do.

So, what do we take away from all this?

That nature, in all its wonder and weirdness, is not just beautiful- it’s brilliant. It’s not writing Python scripts or tuning loss functions, but it’s optimising nonetheless. Whether it’s bees building hexagonal homes, plants aligning their leaves for the perfect sunbathing angle, penguins performing synchronised dances to survive the cold, or cuttlefish flashing instant camouflage- all of it is nature’s quiet, relentless search for better answers.

It’s funny, really. As machine learning engineers, we spend our days tweaking models, tuning parameters, and obsessing over metrics. We call it artificial intelligence, but there’s nothing artificial about the logic underneath. It’s borrowed. Inspired. Stolen, even- from the original problem-solver: evolution.

Gradient descent may be the cornerstone of modern AI, but its core principle- make a small mistake, learn from it, improve just a bit- is as old as life itself.

And maybe that’s the real kicker of working in this field: we like to think we're at the bleeding edge of innovation, but most days we're just reinventing what nature nailed a billion years ago- without using a single GPU.

So the next time your model, like mine, flatlines at 51% accuracy and you’re 3 cups-of-tea deep wondering if you should’ve just opened a cafe in the hills instead- remember: even cuttlefish are doing online learning in real time while looking fabulous.

That’s it for this crumb, see you in the next!

Buy Me A Book

crumb#008: Gradient Descent Was Nature’s Idea

Biology has been optimising shapes, patterns, and behaviour long before Python got involved.

Discussion about this post