suman@paudel:~$[░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 000%● REC 2026
— suman@paudel:~/manifesto —
LATENCY
<900ms
UPTIME
99.9%
CALLS/DAY
2000+
ACCURACY
95%
$ scroll --down --to=manifesto
suman@paudel:~$ cat README.md

I build systems that listen, understand, and respond — at scale.

Five years of relentless pursuit across ML, AI, and voice engineering. From training models to orchestrating 2000+ daily calls with sub-second latency. From fine-tuning vision models to winning international NLP competitions.

This is my manifesto — the principles that drive every system I build.

suman@paudel:~$ cd manifesto/part_i && ls -la

// PART I — THE FOUNDATION

what I optimize for when building voice AI systems

[01]suman@paudel:~$ cat manifesto/01_latency_is_everything.md
── LATENCY IS EVERYTHING ──

Every millisecond is a conversation lost

> In real-time voice AI, latency isn't a metric — it's the difference
> between a natural conversation and an awkward silence.
>
> I architected self-hosted LiveKit infrastructure on AWS achieving
> <900ms end-to-end latency, orchestrating 2000+ daily calls at
> 99.9% uptime. Because humans don't wait for machines.
p50_latency_ms=871
p99_latency_ms=1.2k
concurrent_calls=120
[02]suman@paudel:~$ cat manifesto/02_scale_relentlessly.md
── SCALE RELENTLESSLY ──

One call is a demo. Two thousand is a system.

> Built end-to-end voice AI backend via multi-channel SIP trunking
> across Twilio, Plivo, Telecimi, Acefone with automatic failover.
>
> Engineered resilient call routing with intelligent retry mechanisms,
> reducing failed call rate by 60%. Automated 3000+ monthly healthcare
> interactions.
daily_calls=2000+
failed_rate_delta=-60%
sip_providers=4
[03]suman@paudel:~$ cat manifesto/03_own_the_stack.md
── OWN THE STACK ──

Open source is not a compromise. It's a weapon.

> Integrated open-source Qwen3 TTS with LiveKit via custom Alibaba
> Cloud deployment, reducing TTS costs by 70% while maintaining
> natural speech quality.
>
> When you own every layer, you control every outcome.
tts_cost_delta=-70%
self_hosted=true
vendor_lock=none
[04]suman@paudel:~$ cat manifesto/04_impact_over_output.md
── IMPACT OVER OUTPUT ──

Measure in outcomes, not deployments

> Improved patient medication adherence by 40% through proactive
> automated refill reminders. Reduced manual claim processing time
> by 5+ hours daily.
>
> 92% query accuracy on healthcare document retrieval.
> The code is invisible — the impact is what patients feel.
adherence_delta=+40%
hours_saved_daily=5+
query_accuracy=92%
[05]suman@paudel:~$ cat manifesto/05_inference_is_infra.md
── INTELLIGENCE AT INFERENCE ──

Serve models like you serve users — fast and reliable

> Architected a production-grade open-source LLM inference platform
> using vLLM, serving GPT-OSS-20B, Qwen3-4B embeddings, and
> Qwen3-VL-8B.
>
> 5 concurrent inference jobs on a single NVIDIA H100, fully
> compliant with data-sovereignty requirements.
gpu=H100
concurrent_jobs=5
engine=vLLM
suman@paudel:~$ cd ../part_ii && ls -la

// PART II — THE CRAFT

how I push boundaries when the pre-trained falls short

[06]suman@paudel:~$ cat manifesto/06_retrieval_is_reasoning.md
── RETRIEVAL IS REASONING ──

The best answer is the one grounded in truth

> Designed RAG + SQL agent pipelines solving large-scale manufacturing
> analytics with LangGraph, enabling unified reasoning across multi-
> file documents and structured databases.
>
> 95% retrieval accuracy. 78% on complex multi-step reasoning.
retrieval_acc=95%
multi_step=78%
stack=LangGraph
[07]suman@paudel:~$ cat manifesto/07_fine_tune_fearlessly.md
── TEACH THE MACHINE TO SEE ──

When pre-trained falls short, fine-tune fearlessly

> Fine-tuned state-of-the-art OCR and Vision-Language models —
> DeepSeek OCR, PaddleOCR, OLMO-v2, Qwen3-8B — for banking document
> understanding.
>
> 1000+ labeled samples via Gemini 3.0 as teacher. SFT + DPO pushed
> domain-specific accuracy from 84% → 95%.
acc_before=84%
acc_after=95%
technique=SFT+DPO
[08]suman@paudel:~$ cat manifesto/08_compete_to_learn.md
── COMPETE TO LEARN ──

The arena sharpens the blade

> Won CHIPSAL @ COLING 2025 — Shared Task on Hate Speech Target
> Identification in Devanagari script.
>
> Research paper accepted at ACL — the world's premier NLP conference.
> Competition isn't about winning — it's about pushing the boundary
> of what you know into territory that does not yet exist.
competition=1st place
venue=ACL 2025
script=Devanagari
suman@paudel:~$ echo $MISSION

> let's build
what matters.

The future belongs to those who ship outcomes, not outputs.
If you're building something that listens — I want to help.

suman@paudel:~$ exit