Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s

Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s

In my latest DevOps and AI experiment, I deployed two powerful open-source language models — Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s — on a Kubernetes cluster using MicroK8s. This hands-on setup allowed me to explore performance characteristics, resource consumption, and architectural implications of running large-scale models locally without relying on cloud GPUs.

Hardware Environment: Deploying DeepSeek-8b and DeepSeek-32b on MicroK8s

The deployment was hosted on a Dell R720 server with the following specifications:

  • 40 CPUs

  • 128 GB RAM

  • 3 TB Storage

For optimized performance, I allocated 36 CPUs specifically for Ollama, the lightweight framework responsible for serving the DeepSeek models.

Deployment Stack Overview

The following technologies formed the core of my deployment architecture:

  • MicroK8s – A lightweight Kubernetes distribution ideal for local and edge deployments.

  • Ollama – Used for efficiently serving LLMs in a containerized setup.

  • Open WebUI – Offered a clean, user-friendly interface to interact with the models.

  • Persistent Volumes – Ensured stable, consistent data storage across pods.

  • LoadBalancer Services – Enabled external access to the cluster’s internal services.

Observations and Model Comparison

Both DeepSeek-8b and DeepSeek-32b were tested using identical prompts. Here are my key findings:

  • DeepSeek-8b:
    Produced long, descriptive responses — ideal for exploratory tasks and creative writing.

  • DeepSeek-32b:
    Delivered concise, accurate, and well-structured answers, making it better suited for high-stakes or professional use cases.

  • Performance Trade-Off:
    The 32b model took roughly 10 minutes to generate a complete response, demonstrating the computational demands of larger models, especially when run without GPU acceleration.

This side-by-side comparison offered valuable insights into the trade-offs between model size, accuracy, and compute cost.

Conclusion

This project extended beyond deploying language models. It was a comprehensive exercise in managing AI workloads within a Kubernetes-powered production environment. Leveraging local bare-metal infrastructure, I successfully created a cost-effective, cloud-independent AI setup.

There’s still room for optimization, but this deployment lays a strong foundation for future experiments with local LLMs and edge AI systems.

Are you exploring similar setups? Feel free to connect — I’m always open to sharing experiences and learning from others in this space.

Visit My Website: Mjunaid

Visit My LinkedIn: Click Here

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top