Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s

In my latest DevOps and AI experiment, I deployed two powerful open-source language models — Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s — on a Kubernetes cluster using MicroK8s. This hands-on setup allowed me to explore performance characteristics, resource consumption, and architectural implications of running large-scale models locally without relying on cloud GPUs.

Hardware Environment: Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s

The deployment was hosted on a Dell R720 server with the following specifications:

40 CPUs
128 GB RAM
3 TB Storage

For optimized performance, I allocated 36 CPUs specifically for Ollama, the lightweight framework responsible for serving the DeepSeek models.

Deployment Stack Overview

The following technologies formed the core of my deployment architecture:

MicroK8s – A lightweight Kubernetes distribution ideal for local and edge deployments.
Ollama – Used for efficiently serving LLMs in a containerized setup.
Open WebUI – Offered a clean, user-friendly interface to interact with the models.
Persistent Volumes – Ensured stable, consistent data storage across pods.
LoadBalancer Services – Enabled external access to the cluster’s internal services.

Observations and Model Comparison

Both DeepSeek-8b and DeepSeek-32b were tested using identical prompts. Here are my key findings:

DeepSeek-8b:
Produced long, descriptive responses — ideal for exploratory tasks and creative writing.
DeepSeek-32b:
Delivered concise, accurate, and well-structured answers, making it better suited for high-stakes or professional use cases.
Performance Trade-Off:
The 32b model took roughly 10 minutes to generate a complete response, demonstrating the computational demands of larger models, especially when run without GPU acceleration.

This side-by-side comparison offered valuable insights into the trade-offs between model size, accuracy, and compute cost.

Conclusion

This project extended beyond deploying language models. It was a comprehensive exercise in managing AI workloads within a Kubernetes-powered production environment. Leveraging local bare-metal infrastructure, I successfully created a cost-effective, cloud-independent AI setup.

There’s still room for optimization, but this deployment lays a strong foundation for future experiments with local LLMs and edge AI systems.

Are you exploring similar setups? Feel free to connect — I’m always open to sharing experiences and learning from others in this space.

Visit My Website: Mjunaid

Visit My LinkedIn: Click Here

Hardware Environment: Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s

Deployment Stack Overview

Observations and Model Comparison

Conclusion

Leave a Comment Cancel Reply