Teaching a Computer to Read Nepali Handwriting (And What It Cost Me)

The headline number is 99.98 percent. That is the test accuracy my convolutional network reached on handwritten Devanagari character recognition. It is the number I put on my resume. It is the number people ask about. But the number hides almost everything that actually happened, so let me tell you the parts that do not fit in a bullet point.

Why this mattered to me

Devanagari is the script of Nepali and Hindi and a lot of South Asian languages. The world built incredible OCR for Latin scripts decades ago. For Devanagari it always felt like an afterthought. The conjuncts are hard. The matras sit above and below the line. Handwriting varies wildly. So when I picked this for a deep dive, it was personal. This is my script. I grew up writing these characters by hand in school notebooks.

Every dataset that works perfectly was built by someone who suffered through the messy version first.

The GPU problem in Nepal

Here is a thing developers in richer places do not think about. GPU access. I did not have a beefy machine. Renting cloud GPUs is priced in dollars, and dollars are expensive when you earn in rupees. Every training run had a real cost I could feel in my wallet. So I could not just spray and pray with hyperparameters. I had to think before I burned a run.

That constraint, honestly, made me a better engineer. When each experiment costs you real money, you stop being lazy. You read the loss curves carefully. You change one thing at a time. You write down what you tried so you never accidentally pay twice for the same failed idea.

The nights

I trained mostly at night because that is when the power was stable and the internet was not fighting the whole neighborhood. I would kick off a run, watch the first few epochs to make sure it was not diverging, then try to sleep while it cooked. I would wake up and check the validation accuracy like checking a lottery ticket.

The architecture itself was not exotic. Stacked convolution and pooling, batch normalization, dropout to fight overfitting, a dense head. The wins did not come from a clever model. They came from boring discipline:

Augmentation that matched real handwriting: small rotations, shifts, elastic distortions. Not random nonsense.
Cleaning the dataset by hand when the labels were wrong. Yes, by hand. For a long time.
Learning rate scheduling instead of one fixed value and hope.
Early stopping so I did not overtrain into a worse model.

What 99.98 percent actually means

Let me be honest about this number, because people misread it. It is test accuracy on a held-out set of clean, segmented single characters. That is a controlled condition. In the wild, with someone scrawling a full word on a phone screen at an angle in bad light, you do not get 99.98 percent. Segmentation alone will eat you alive.

So I paired the model with an Android draw-and-predict app. You draw a character, it predicts. In that narrow, controlled setting the number mostly holds and it feels like magic. But I would never tell a client that an end-to-end document OCR system inherits that number. It does not. The character classifier is one strong link in a chain that has weaker links.

What it cost, and what it gave

It cost me sleep, some money I did not really have, and a few weeks where I was bad company because half my brain was watching a loss curve. What it gave me was deeper. It taught me that the gap between a research number and a product is enormous, and that closing it is the actual job. The model was the easy part. The truth in that one decimal place was the whole education.

Saroj Prasad Mainali

Full-Stack Engineer · Kathmandu

WHAT I AM DOING NOW →