Dive into my musings on life and tech in my latest posts; a blend of introspection and innovation. Keep an eye out for fresh insights and updates!
You’re in a final-round interview for a Senior AI Engineer role at NVIDIA Robotics .
You’re in a Research Scientist interview at Google DeepMind , and the lead researcher throws you a curveball: “I have a dataset of reasoning traces, but they’re all flawed.
You’re in a Machine Learning interview at DeepSeek AI and the lead researcher asks: “We want to train a reasoning model using 𝐃𝐢𝐫𝐞𝐜𝐭 𝐏𝐫𝐞𝐟𝐞𝐫...
You’re in a Machine Learning Engineer interview at OpenAI and the lead researcher asks: “We have a massive dataset of human expert demonstrations for this task.
You’re in a Machine Learning interview at Tesla the interviewer asks: “We have an imitation learning agent that is underfitting complex human driving data.
You’re in a Machine Learning Engineer interview at Anthropic , and the interviewer drops this on you: “In Supervised Learning, we assume data is IID (Independent and Identically Distributed).
You’re in a Computer Vision interview at OpenAI .
You’re in a Senior AI Interview at Google DeepMind .
You’re in a Senior AI Interview at Google DeepMind .
You’re in a final-round Computer Vision interview at OpenAI .
You’re in a Senior Robotics interview at NVIDIA .
You’re in a Senior Computer Vision interview at OpenAI .
You’re in a Senior Computer Vision interview at Google DeepMind .
You are in a Senior Computer Vision interview at Google DeepMind .
You’re in a Senior AI Interview at OpenAI .
How aggressive batch difficulty pushes CLIP from semantic understanding into pixel-level cheating.
How contrastive pretraining collapses spatial information - and why LLaVA-style models must use penultimate patch embeddings.
Why attention handles communication, but MLPs do the real computation in modern vision transformers.
Why disabling data augmentation during evaluation is the only way to measure real generalization.
Why linear learning-rate scaling silently kills SGD’s implicit regularization and destroys test accuracy.
Why single-text prompts are noisy estimates in high-dimensional space—and how centroid stabilization fixes zero-shot accuracy.
The hidden activation-memory cost of keeping time alive in deep video networks.
Why Faster R-CNN still beats YOLO when defects are smaller than your receptive field.
Why injecting zeros at image borders silently breaks translation equivariance and corrupts edge statistics.
Why replacing a 7×7 convolution with three 3×3 layers isn’t about parameters — it’s about nonlinear expressivity.
Why shrinking an overfitting network makes optimization harder, and why over-parameterization is the safer bet.
Why lowering the learning rate can't resurrect dead neurons - and how architectural gradient flow actually fixes it.
Why rotating the feature space instantly exposes candidates who don’t understand metric invariance.
Why a Softmax loss of 0.05 at step zero doesn’t mean your model is brilliant — it means your training pipeline is broken.
Why labeling 500k more images from the same distribution won’t fix overfitting—and how active learning actually moves the decision boundary.
Top 25 ML System Design Interview Questions
Why CNNs learn one visual feature once, while dense networks must relearn it at every pixel.
Why generating synthetic sources (not targets) is the only way to preserve decoder fluency in production NMT systems.
Why model cascades fail not on routing logic, but on overconfident cheap models that never escalate.
Why shuffling General, Code, and Math data together silently caps reasoning performance and how staged pretraining unlocks true chain-of-thought.
Explore the origins of modern deep learning with a look back at Yann LeCun's groundbreaking LeNet-1 demo from 1989. This article delves into the foundational concepts of convolutional neural networks, their evolution, and what today's AI engineers can learn from the elegant simplicity of early models.
This doc covers everything from the basics of RAG to advanced techniques for addressing hallucination and retrieval challenges. It also includes practical insights and best practices for implementing RAG in real-world applications.
Introducing RepoGraph, a graph-based module that maps out the structure of an entire codebase
Running and fine-tuning open-source LLMs have become essential practices in the field of natural language processing (NLP). This guide provides a detailed overview of the processes involved, the tools and frameworks used, and best practices for optimizing performance.
AI agents are software programs that can perform tasks autonomously, using natural language to interact with users and other systems. They are designed to be able to learn and adapt to new situations, making them increasingly useful in a wide range of applications.
A deep dive into DeepSeek's latest models, exploring their architecture, training methodology, and emergent reasoning capabilities.