Let's build something great together.

Whether you have a project idea, a research collaboration, or just want to say hello — my inbox is always open.

Blog

Thoughts, tutorials, and insights on full-stack development, AI/ML, and modern web technologies.

Optimizing LLMs for Low-Latency Inference

Reduce latency in large language models for real-time applications with these practical tips and code examples.

Reduce AI model size and improve inference speed for edge devices or mobile apps with quantization techniques.

Improve customer service systems with AI voice features using OpenAI API optimization techniques and best practices.

Reduce the cost of AI model inference for your application with NVIDIA and Google Infrastructure.

Improve AI model performance with efficient feature selection techniques.

Improve AI model performance with efficient feature selection techniques

Improve AI model performance with efficient feature selection techniques.

Leverage AI-driven approaches to design and optimize chips for AI workloads, reducing development time and costs.