Running LLMs locally: Practical LLM Performance on DGX Spark — Mozhgan Kabiri chimeh, NVIDIA

Benchmarks1 month ago

Running LLMs locally: Practical LLM Performance on DGX Spark — Mozhgan Kabiri chimeh, NVIDIA

Descriptions:

Moving LLM workloads from the cloud to local infrastructure requires a shift in engineering strategy. In this talk, I share my journey of serving and benchmarking open-source models (1.5B to 14B) on an NVIDIA DGX Spark workstation. Using a reproducible methodology with vLLM, I analyze real-world trade-offs in throughput, latency, and the benefits of the 128GB Grace Blackwell unified memory architecture. You will leave with a clear framework for local model sizing, an understanding of quantization performance like NVFP4, and a guide for when local compute is the right choice for your AI stack.

Speaker info:
– LinkedIn https://www.linkedin.com/in/mozhgankch/

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

Nvidia

Tags

Blackwell DGX Spark NVFP4 Nvidia VLLM

Prev

LFM2.5‑VL-450M: Liquid AI’s Tiny 450M Vision Model Does More Than You Expect

LFM2.5‑VL-450M: Liquid AI’s Tiny 450M Vision Model Does More Than You Expect

Next

Seedance 2.0 + Claude Code Creates $10k Websites in Minutes

Seedance 2.0 + Claude Code Creates $10k Websites in Minutes

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago