How To Install DeepSeek on Debian 12
Running powerful AI language models locally offers tremendous advantages for developers, researchers, and privacy-conscious users. DeepSeek represents one of the most capable open-source language models available today, and installing it on Debian 12 provides a stable, secure foundation for AI experimentation. This comprehensive guide walks through every aspect of getting DeepSeek up and running on your Debian system, from understanding the hardware requirements to advanced configuration options.
Understanding DeepSeek AI
DeepSeek is an advanced large language model (LLM) designed to provide powerful natural language processing capabilities while running directly on your local hardware. Unlike cloud-based solutions that send your data to remote servers, DeepSeek can operate entirely offline, giving you complete control over your information and interactions.
What is DeepSeek?
DeepSeek represents a family of open-source language models created to provide accessible AI capabilities. These models can understand context, generate human-like text, assist with coding tasks, and answer complex questions based on their training data. The technology leverages transformer architecture, similar to models like LLaMA and Mistral, but with its own unique optimizations and capabilities.
Types of DeepSeek Models
DeepSeek comes in several variants to accommodate different hardware configurations:
- DeepSeek 1.5B – Lightweight model suitable for systems with limited resources
- DeepSeek 7B – Mid-range model balancing performance and resource requirements
- DeepSeek 13B – Larger model with improved reasoning and capabilities
- DeepSeek 70B – Most powerful variant with exceptional capabilities (requires significant hardware)
Each model offers progressively better performance at the cost of increased resource requirements. For most home or small business users, the 7B or 13B models provide an excellent balance of capability and practicality.
DeepSeek vs. Other Local LLMs
Compared to alternatives like LLaMA, Mistral, or Falcon, DeepSeek offers several competitive advantages:
- Strong performance on coding and technical tasks
- Efficient resource utilization for its capability level
- Good balance of reasoning and knowledge capabilities
- Active development and community support
While other models may excel in specific domains, DeepSeek provides a well-rounded experience that performs admirably across various use cases.
Why Run DeepSeek Locally?
The benefits of running DeepSeek on your own Debian 12 system include:
- Complete privacy with no data leaving your system
- No subscription costs or API usage limits
- Full customization of model parameters and behavior
- Consistent availability without internet dependency
- Integration possibilities with local applications and workflows
- Learning opportunities about AI system administration
Local deployment puts you in control of both the technology and your data, making it ideal for sensitive applications or continuous usage scenarios.
Hardware Prerequisites
Before attempting to install DeepSeek, understanding your hardware requirements is essential for a successful deployment.
System Requirements
The hardware needed depends heavily on which DeepSeek model you plan to run:
- CPU: Modern x86-64 processor with AVX2 support (Intel 6th gen or newer, AMD Ryzen or newer)
- RAM: Minimum 8GB for the 1.5B model, 16GB for 7B, 32GB for 13B, and 64GB+ for 70B
- Storage: At least 10GB free space for the base system plus 2-40GB depending on model size
- GPU: Optional but highly recommended for reasonable performance
The system requirements increase substantially with larger models. Without a GPU, expect significantly slower inference times, especially with models larger than 7B parameters.
CPU vs. GPU Performance
While DeepSeek can run on CPU-only systems, the performance difference with a GPU is substantial:
- CPU-only: 0.5-2 tokens per second for larger models (practical for simple queries)
- Entry-level GPU (4GB VRAM): 5-15 tokens per second
- Mid-range GPU (8GB+ VRAM): 15-30+ tokens per second
- High-end GPU (16GB+ VRAM): 30-60+ tokens per second
For interactive use, a discrete GPU dramatically improves the experience. NVIDIA GPUs tend to offer the best performance due to CUDA optimization, though AMD GPUs can also work with ROCm support on Debian 12.
Memory Requirements
RAM is the most critical resource for running larger language models. As a general guideline:
- DeepSeek 1.5B: 6-8GB RAM
- DeepSeek 7B: 12-16GB RAM
- DeepSeek 13B: 24-32GB RAM
- DeepSeek 70B: 64GB+ RAM
These requirements can be reduced somewhat through quantization techniques, but this may affect model quality.
Storage Space Needed
Ensure sufficient disk space is available:
- Base Debian 12 installation: 8-10GB
- DeepSeek 1.5B model: 1-2GB
- DeepSeek 7B model: 4-7GB
- DeepSeek 13B model: 8-13GB
- DeepSeek 70B model: 35-40GB
SSD storage is strongly recommended for faster model loading and better overall system responsiveness.
Hardware Compatibility on Debian 12
Debian 12 (Bookworm) has excellent hardware compatibility, particularly for:
- AMD and Intel processors from the last decade
- NVIDIA GPUs with community or proprietary drivers
- AMD GPUs with open-source drivers
- Most standard server and desktop hardware components
For GPU acceleration, ensure your graphics drivers are properly installed before proceeding with DeepSeek installation.
Software Prerequisites
A properly configured Debian 12 system forms the foundation for running DeepSeek effectively.
Debian 12 Preparation
Start with a clean, updated Debian 12 installation:
sudo apt update
sudo apt upgrade -y
Ensure your system locale is properly configured to avoid text encoding issues:
sudo dpkg-reconfigure locales
Required Packages
Install essential dependencies for running DeepSeek and its supporting software:
sudo apt install -y build-essential python3 python3-pip python3-venv python3-dev git curl wget cmake jq htop
For GPU support on NVIDIA systems, install the appropriate drivers and CUDA:
sudo apt install -y nvidia-driver nvidia-cuda-toolkit
For AMD GPUs, install ROCm support packages:
sudo apt install -y rocm-opencl-runtime
Package Manager Updates
Ensure pip and related tools are up-to-date:
pip3 install --upgrade pip setuptools wheel
Terminal Access
Most operations require terminal access. If using a desktop environment, launch the terminal application. For headless servers, connect via SSH with:
ssh username@debian-server-ip
A properly configured terminal environment with bash or zsh provides the best experience for installing and managing DeepSeek.
Installation Methods Overview
Several installation approaches exist for running DeepSeek on Debian 12, each with distinct advantages.
Ollama Method
Ollama provides the simplest approach for running various LLMs including DeepSeek. This method offers:
- Quick, straightforward installation process
- Easy model management and switching
- Simplified API and command-line interface
- Lower technical knowledge requirements
For most users, particularly beginners, the Ollama approach represents the recommended path.
vLLM Approach
vLLM offers enhanced performance with more advanced configuration options:
- Better throughput and latency optimizations
- More granular control over execution parameters
- Advanced memory management for larger models
- Better support for professional applications
This method requires more technical knowledge but delivers superior performance.
Docker Installation
Docker provides isolated, containerized deployment:
- Clean separation from the host system
- Easier version management and updates
- Consistent environment across different systems
- Simplified dependency management
Docker combines ease of deployment with flexibility, making it an excellent choice for many users.
Method Comparison
- Simplicity: Ollama > Docker > vLLM
- Performance: vLLM > Ollama > Docker
- Flexibility: vLLM > Docker > Ollama
- Ease of maintenance: Docker > Ollama > vLLM
The remainder of this guide focuses primarily on the Ollama method, with notes on Docker alternatives where applicable.
Installing Ollama on Debian 12
Ollama serves as a convenient wrapper around various large language models, making them easier to deploy and manage.
What is Ollama?
Ollama is an open-source tool that simplifies running, managing, and interacting with various language models locally. It handles downloading, loading, and serving models through a straightforward API and command-line interface. Essentially, Ollama does the heavy lifting of model management so you can focus on using the AI capabilities.
Getting Ollama
Download the Ollama installation script with:
curl -fsSL https://ollama.com/install.sh > install-ollama.sh
Before running any downloaded script, it’s good practice to review its contents:
less install-ollama.sh
Installation Process
Once satisfied with the script contents, make it executable and run it:
chmod +x install-ollama.sh
sudo ./install-ollama.sh
The installation process will:
- Download the appropriate Ollama binary for your system
- Install it to /usr/local/bin
- Create necessary configuration directories
- Set up a systemd service for automatic startup
Verification
Verify that Ollama installed correctly:
ollama --version
If installation was successful, you’ll see the version number displayed.
Starting Ollama Service
Enable and start the Ollama service to run automatically at boot:
sudo systemctl enable ollama
sudo systemctl start ollama
Check the service status to ensure it’s running properly:
sudo systemctl status ollama
You should see “active (running)” in the output, indicating Ollama is operational.
Downloading and Running DeepSeek Models
With Ollama installed, the next step is obtaining and running the DeepSeek models themselves.
Available DeepSeek Models
Ollama provides access to various DeepSeek models through its repository. The main variants include:
- deepseek-coder – Specialized for programming tasks
- deepseek-llm – General purpose language model
- deepseek-visual – Multimodal model supporting images and text
Each model comes in different parameter sizes and quantization levels to accommodate various hardware configurations.
Choosing the Right Model Size
Select a model appropriate for your hardware:
- For systems with 8GB RAM: deepseek-coder:1.5b-base-q4_0
- For systems with 16GB RAM: deepseek-coder:7b-base-q4_K_S
- For systems with 32GB RAM: deepseek-coder:13b-base-q5_K_M
- For systems with 64GB+ RAM: deepseek-coder:33b-instruct-q5_K_M
The suffix indicates the quantization level, with lower numbers (q4) requiring less memory but potentially reducing quality compared to higher numbers (q5, q8).
Pulling Models with Ollama
Download your chosen model using the pull command:
ollama pull deepseek-coder:7b-base-q4_K_S
This process downloads and prepares the model for use. Depending on your internet connection and the model size, this may take several minutes to complete.
Initial Model Testing
Test your newly downloaded model with a simple query:
ollama run deepseek-coder:7b-base-q4_K_S "Explain how to create a simple HTTP server in Python"
The model should generate a response explaining the requested information. This confirms that both Ollama and the DeepSeek model are functioning correctly.
Model Management
List all downloaded models:
ollama list
Remove a model to free up disk space:
ollama rm deepseek-coder:7b-base-q4_K_S
Pulling an updated version:
ollama pull deepseek-coder:7b-base-q4_K_S
Proper model management helps maintain system performance and storage efficiency.
Setting Up Open WebUI Interface
While the command line offers direct access to DeepSeek, a web interface provides a more user-friendly experience similar to commercial AI chatbots.
What is Open WebUI?
Open WebUI is a browser-based graphical interface for interacting with locally hosted language models. It offers features like:
- Persistent chat sessions
- Conversation history storage
- File uploads and attachments
- Model switching between different LLMs
- Visual customization options
- User account management
This interface transforms DeepSeek from a command-line tool into a full-featured chatbot experience.
Installation Methods
Open WebUI can be installed either directly on your system or through Docker:
- Direct installation offers deeper integration with your system
- Docker provides easier maintenance and isolation
Both methods achieve the same end result with different maintenance implications.
Direct Installation Steps
Create a dedicated Python virtual environment for Open WebUI:
mkdir -p ~/open-webui
cd ~/open-webui
python3 -m venv venv
source venv/bin/activate
Installing Open WebUI Packages
Install the Open WebUI application:
pip install open-webui
This command installs the web interface and all its dependencies within the isolated virtual environment.
Running the Web Server
Launch the Open WebUI server:
open-webui start
The first launch will create configuration files and initialize the database. After a few moments, the server will be running.
Accessing the Interface
Open a web browser and navigate to:
http://localhost:8080
If accessing from another device on your network, replace “localhost” with your Debian server’s IP address.
User Account Creation
On first access, you’ll be prompted to create an administrator account:
- Enter a username and password
- Complete the setup wizard
- Connect to your local Ollama instance (usually at http://localhost:11434)
Once configured, you can begin chatting with your DeepSeek model through the web interface.
Docker Installation Alternative
Docker provides a self-contained environment for running Open WebUI with minimal system modification.
Docker Advantages
Using Docker for Open WebUI offers several benefits:
- Simplified installation and updates
- Reduced dependency conflicts
- Easy migration between systems
- Consistent environment regardless of host configuration
- Isolated resource management
These advantages make Docker particularly appealing for server environments or systems where multiple applications run.
Installing Docker on Debian 12
Install Docker using the official Debian repository:
sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable docker
sudo systemctl start docker
Add your user to the Docker group to avoid using sudo for every command:
sudo usermod -aG docker $USER
Log out and back in for this change to take effect.
Pulling Open WebUI Image
Pull the latest Open WebUI Docker image:
docker pull ghcr.io/open-webui/open-webui:main
Container Configuration
Create a directory for persistent storage:
mkdir -p ~/open-webui-data
Running the Container
Launch the Open WebUI container with the following command:
docker run -d \
--name open-webui \
-p 8080:8080 \
-v ~/open-webui-data:/app/backend/data \
-e OLLAMA_API_BASE_URL=http://host.docker.internal:11434 \
--add-host host.docker.internal:host-gateway \
ghcr.io/open-webui/open-webui:main
This command:
- Creates a container named “open-webui”
- Maps port 8080 to your host system
- Creates a persistent volume for data storage
- Configures connection to your Ollama instance
Accessing Docker Web Interface
Open your browser and navigate to:
http://localhost:8080
Follow the same initial setup process described in the direct installation method.
Using DeepSeek via Command Line
The command line offers the most direct way to interact with DeepSeek models.
Basic Interaction
Start an interactive chat session:
ollama run deepseek-coder:7b-base-q4_K_S
This launches a conversation where you can type prompts and receive responses. Press Ctrl+D or type “/exit” to end the session.
One-off Queries
For single queries without starting a persistent session:
ollama run deepseek-coder:7b-base-q4_K_S "Write a bash function that finds all files larger than 100MB"
This pattern works well for scripts or quick questions without the overhead of maintaining a session.
Scripting with DeepSeek
Create a simple shell script to automate interactions:
#!/bin/bash
prompt="$1"
model="${2:-deepseek-coder:7b-base-q4_K_S}"
echo "Asking $model: $prompt"
ollama run "$model" "$prompt"
Save this as `ask-ai.sh`, make it executable with `chmod +x ask-ai.sh`, and use it like:
./ask-ai.sh "Explain how TCP/IP works"
Command Line Parameters
Customize model behavior with additional parameters:
ollama run deepseek-coder:7b-base-q4_K_S \
--temperature 0.7 \
--top_p 0.9 \
--context_length 4096
Higher temperature values (0-1) increase creativity, while lower values produce more deterministic responses.
Using DeepSeek via API
For developers, the API offers programmatic access to DeepSeek’s capabilities.
Starting the API Server
Ollama automatically runs an API server on port 11434. Verify it’s accessible:
curl http://localhost:11434/api/version
This should return a JSON response with the Ollama version.
API Endpoints
The main endpoints include:
/api/generate
– Generate text from a prompt/api/chat
– Interactive chat completion/api/embeddings
– Generate vector embeddings/api/models
– List available models
Each endpoint accepts specific parameters detailed in the Ollama documentation.
cURL Examples
Generate a response from DeepSeek:
curl -X POST http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{
"model": "deepseek-coder:7b-base-q4_K_S",
"prompt": "Write a Python function to check if a string is a palindrome"
}'
Python API Usage
Create a simple Python script to interact with DeepSeek:
import requests
import json
def ask_deepseek(prompt, model="deepseek-coder:7b-base-q4_K_S"):
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": model, "prompt": prompt}
)
return response.json()["response"]
# Example usage
result = ask_deepseek("Explain quantum computing in simple terms")
print(result)
API Parameters
Customize responses with additional parameters:
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "deepseek-coder:7b-base-q4_K_S",
"prompt": "Write a sorting algorithm",
"temperature": 0.5,
"top_p": 0.9,
"max_tokens": 500
}
)
These parameters control response generation characteristics like creativity, length, and determinism.
Performance Optimization
Optimizing your DeepSeek installation ensures the best possible experience on your hardware.
System Resource Management
Allocate appropriate resources to Ollama:
OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_THREADS=4 OLLAMA_GPU_LAYERS=43 ollama serve
These environment variables control:
- OLLAMA_NUM_THREADS: CPU threads used for computation
- OLLAMA_GPU_LAYERS: Number of model layers offloaded to GPU
- OLLAMA_HOST: Network interface binding
Model Parameter Tweaking
Create a custom model configuration for optimized performance:
ollama create deepseek-optimized -f Modelfile
Where Modelfile contains:
FROM deepseek-coder:7b-base-q4_K_S
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 4096
This creates a custom model variant with your preferred default parameters.
Response Generation Settings
Different use cases require different generation settings:
- Creative writing: Higher temperature (0.7-0.9)
- Factual responses: Lower temperature (0.1-0.3)
- Code generation: Medium temperature (0.3-0.6) with higher top_k values
Experiment to find the optimal balance for your specific needs.
Background Process Management
For server deployments, use systemd to manage Ollama and Open WebUI:
sudo systemctl enable --now ollama
Create a systemd service file for Open WebUI at /etc/systemd/system/open-webui.service
:
[Unit]
Description=Open WebUI
After=network.target ollama.service
[Service]
User=your_username
WorkingDirectory=/home/your_username/open-webui
ExecStart=/home/your_username/open-webui/venv/bin/open-webui start
Restart=on-failure
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl enable --now open-webui
Using Swap Space
For systems with limited RAM, configure additional swap space:
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Add to /etc/fstab
for persistence:
/swapfile swap swap defaults 0 0
While swap is slower than RAM, it allows running larger models on systems with limited physical memory.
Troubleshooting Common Issues
Even with careful installation, issues may arise that require troubleshooting.
Installation Failures
If Ollama installation fails:
- Check system architecture compatibility
- Verify internet connectivity
- Ensure sufficient disk space
- Examine logs with
journalctl -u ollama
Common solution:
sudo apt install -y build-essential curl
curl -fsSL https://ollama.com/install.sh | sh
Model Download Problems
For model download failures:
- Check network connectivity
- Verify disk space availability
- Try direct download via API:
curl -X POST http://localhost:11434/api/pull \
-d '{"name": "deepseek-coder:7b-base-q4_K_S"}'
Memory-Related Errors
If experiencing out-of-memory errors:
- Switch to a smaller model variant
- Increase swap space
- Adjust GPU layers with
OLLAMA_GPU_LAYERS
- Try a more aggressive quantization level (q4 instead of q5)
Slow Performance
For sluggish response generation:
- Verify GPU utilization with
nvidia-smi
orrocm-smi
- Check thermal throttling with
sensors
- Monitor system resources with
htop
- Consider reducing context length for faster responses
Web UI Connection Issues
If Open WebUI cannot connect to Ollama:
- Verify Ollama is running:
systemctl status ollama
- Check connection settings in the WebUI configuration
- Ensure the correct URL is specified (usually http://localhost:11434)
- Check for firewall rules blocking communication
Model Loading Errors
For model corruption or loading failures:
- Remove and redownload the model:
ollama rm deepseek-coder:7b-base-q4_K_S
ollama pull deepseek-coder:7b-base-q4_K_S
- Check for disk errors:
sudo fsck -f /dev/sdX
- Verify system stability with a memory test:
memtest86+
Advanced Configuration
Custom Model Parameters
Create a fine-tuned model configuration in a Modelfile:
FROM deepseek-coder:7b-base-q4_K_S
SYSTEM You are a helpful programming assistant specializing in Linux and Debian systems.
PARAMETER temperature 0.4
PARAMETER top_p 0.95
PARAMETER seed 42
TEMPLATE """
{{- if .System }}
SYSTEM: {{ .System }}
{{- end }}
USER: {{ .Prompt }}
ASSISTANT:
"""
Build the custom model:
ollama create deepseek-debian-expert -f ./Modelfile
System Service Setup
Create an optimized systemd service for Ollama:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
Environment="OLLAMA_NUM_THREADS=4"
Environment="OLLAMA_GPU_LAYERS=43"
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
Save as /etc/systemd/system/ollama.service
, then:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Security Considerations
For multi-user environments, implement basic authentication:
- Configure Open WebUI with user accounts
- Set up a reverse proxy with nginx or Apache
- Implement HTTPS with Let’s Encrypt certificates
- Restrict API access to trusted networks
Network Configuration
Allow remote access to your DeepSeek installation:
OLLAMA_HOST=0.0.0.0 ollama serve
Configure Open WebUI for remote access by changing the binding address:
open-webui start --host 0.0.0.0
Remember to configure appropriate firewall rules to protect your installation.
Congratulations! You have successfully installed DeepSeek. Thanks for using this tutorial for installing the DeepSeek AI model on Debian 12 “Bookworm” Linux. For additional help or useful information, we recommend you check the official DeepSeek website.