How To Install DeepSeek on Debian 12

Running powerful AI language models locally offers tremendous advantages for developers, researchers, and privacy-conscious users. DeepSeek represents one of the most capable open-source language models available today, and installing it on Debian 12 provides a stable, secure foundation for AI experimentation. This comprehensive guide walks through every aspect of getting DeepSeek up and running on your Debian system, from understanding the hardware requirements to advanced configuration options.

Table of Contents

Understanding DeepSeek AI

DeepSeek is an advanced large language model (LLM) designed to provide powerful natural language processing capabilities while running directly on your local hardware. Unlike cloud-based solutions that send your data to remote servers, DeepSeek can operate entirely offline, giving you complete control over your information and interactions.

What is DeepSeek?

DeepSeek represents a family of open-source language models created to provide accessible AI capabilities. These models can understand context, generate human-like text, assist with coding tasks, and answer complex questions based on their training data. The technology leverages transformer architecture, similar to models like LLaMA and Mistral, but with its own unique optimizations and capabilities.

Types of DeepSeek Models

DeepSeek comes in several variants to accommodate different hardware configurations:

DeepSeek 1.5B – Lightweight model suitable for systems with limited resources
DeepSeek 7B – Mid-range model balancing performance and resource requirements
DeepSeek 13B – Larger model with improved reasoning and capabilities
DeepSeek 70B – Most powerful variant with exceptional capabilities (requires significant hardware)

Each model offers progressively better performance at the cost of increased resource requirements. For most home or small business users, the 7B or 13B models provide an excellent balance of capability and practicality.

DeepSeek vs. Other Local LLMs

Compared to alternatives like LLaMA, Mistral, or Falcon, DeepSeek offers several competitive advantages:

Strong performance on coding and technical tasks
Efficient resource utilization for its capability level
Good balance of reasoning and knowledge capabilities
Active development and community support

While other models may excel in specific domains, DeepSeek provides a well-rounded experience that performs admirably across various use cases.

Why Run DeepSeek Locally?

The benefits of running DeepSeek on your own Debian 12 system include:

Complete privacy with no data leaving your system
No subscription costs or API usage limits
Full customization of model parameters and behavior
Consistent availability without internet dependency
Integration possibilities with local applications and workflows
Learning opportunities about AI system administration

Local deployment puts you in control of both the technology and your data, making it ideal for sensitive applications or continuous usage scenarios.

Hardware Prerequisites

Before attempting to install DeepSeek, understanding your hardware requirements is essential for a successful deployment.

System Requirements

The hardware needed depends heavily on which DeepSeek model you plan to run:

CPU: Modern x86-64 processor with AVX2 support (Intel 6th gen or newer, AMD Ryzen or newer)
RAM: Minimum 8GB for the 1.5B model, 16GB for 7B, 32GB for 13B, and 64GB+ for 70B
Storage: At least 10GB free space for the base system plus 2-40GB depending on model size
GPU: Optional but highly recommended for reasonable performance

The system requirements increase substantially with larger models. Without a GPU, expect significantly slower inference times, especially with models larger than 7B parameters.

CPU vs. GPU Performance

While DeepSeek can run on CPU-only systems, the performance difference with a GPU is substantial:

CPU-only: 0.5-2 tokens per second for larger models (practical for simple queries)
Entry-level GPU (4GB VRAM): 5-15 tokens per second
Mid-range GPU (8GB+ VRAM): 15-30+ tokens per second
High-end GPU (16GB+ VRAM): 30-60+ tokens per second

For interactive use, a discrete GPU dramatically improves the experience. NVIDIA GPUs tend to offer the best performance due to CUDA optimization, though AMD GPUs can also work with ROCm support on Debian 12.

Memory Requirements

RAM is the most critical resource for running larger language models. As a general guideline:

DeepSeek 1.5B: 6-8GB RAM
DeepSeek 7B: 12-16GB RAM
DeepSeek 13B: 24-32GB RAM
DeepSeek 70B: 64GB+ RAM

These requirements can be reduced somewhat through quantization techniques, but this may affect model quality.

Storage Space Needed

Ensure sufficient disk space is available:

Base Debian 12 installation: 8-10GB
DeepSeek 1.5B model: 1-2GB
DeepSeek 7B model: 4-7GB
DeepSeek 13B model: 8-13GB
DeepSeek 70B model: 35-40GB

SSD storage is strongly recommended for faster model loading and better overall system responsiveness.

Hardware Compatibility on Debian 12

Debian 12 (Bookworm) has excellent hardware compatibility, particularly for:

AMD and Intel processors from the last decade
NVIDIA GPUs with community or proprietary drivers
AMD GPUs with open-source drivers
Most standard server and desktop hardware components

For GPU acceleration, ensure your graphics drivers are properly installed before proceeding with DeepSeek installation.

Software Prerequisites

A properly configured Debian 12 system forms the foundation for running DeepSeek effectively.

Debian 12 Preparation

Start with a clean, updated Debian 12 installation:

sudo apt update
sudo apt upgrade -y

Ensure your system locale is properly configured to avoid text encoding issues:

sudo dpkg-reconfigure locales

Required Packages

Install essential dependencies for running DeepSeek and its supporting software:

sudo apt install -y build-essential python3 python3-pip python3-venv python3-dev git curl wget cmake jq htop

For GPU support on NVIDIA systems, install the appropriate drivers and CUDA:

sudo apt install -y nvidia-driver nvidia-cuda-toolkit

For AMD GPUs, install ROCm support packages:

sudo apt install -y rocm-opencl-runtime

Package Manager Updates

Ensure pip and related tools are up-to-date:

pip3 install --upgrade pip setuptools wheel

Terminal Access

Most operations require terminal access. If using a desktop environment, launch the terminal application. For headless servers, connect via SSH with:

ssh username@debian-server-ip

A properly configured terminal environment with bash or zsh provides the best experience for installing and managing DeepSeek.

Installation Methods Overview

Several installation approaches exist for running DeepSeek on Debian 12, each with distinct advantages.

Ollama Method

Ollama provides the simplest approach for running various LLMs including DeepSeek. This method offers:

Quick, straightforward installation process
Easy model management and switching
Simplified API and command-line interface
Lower technical knowledge requirements

For most users, particularly beginners, the Ollama approach represents the recommended path.

vLLM Approach

vLLM offers enhanced performance with more advanced configuration options:

Better throughput and latency optimizations
More granular control over execution parameters
Advanced memory management for larger models
Better support for professional applications

This method requires more technical knowledge but delivers superior performance.

Docker Installation

Docker provides isolated, containerized deployment:

Clean separation from the host system
Easier version management and updates
Consistent environment across different systems
Simplified dependency management

Docker combines ease of deployment with flexibility, making it an excellent choice for many users.

Method Comparison

Simplicity: Ollama > Docker > vLLM
Performance: vLLM > Ollama > Docker
Flexibility: vLLM > Docker > Ollama
Ease of maintenance: Docker > Ollama > vLLM

The remainder of this guide focuses primarily on the Ollama method, with notes on Docker alternatives where applicable.

Installing Ollama on Debian 12

Ollama serves as a convenient wrapper around various large language models, making them easier to deploy and manage.

What is Ollama?

Ollama is an open-source tool that simplifies running, managing, and interacting with various language models locally. It handles downloading, loading, and serving models through a straightforward API and command-line interface. Essentially, Ollama does the heavy lifting of model management so you can focus on using the AI capabilities.

Getting Ollama

Download the Ollama installation script with:

curl -fsSL https://ollama.com/install.sh > install-ollama.sh

Before running any downloaded script, it’s good practice to review its contents:

less install-ollama.sh

Installation Process

Once satisfied with the script contents, make it executable and run it:

chmod +x install-ollama.sh
sudo ./install-ollama.sh

The installation process will:

Download the appropriate Ollama binary for your system
Install it to /usr/local/bin
Create necessary configuration directories
Set up a systemd service for automatic startup

Verification

Verify that Ollama installed correctly:

ollama --version

If installation was successful, you’ll see the version number displayed.

Starting Ollama Service

Enable and start the Ollama service to run automatically at boot:

sudo systemctl enable ollama
sudo systemctl start ollama

Check the service status to ensure it’s running properly:

sudo systemctl status ollama

You should see “active (running)” in the output, indicating Ollama is operational.

Downloading and Running DeepSeek Models

With Ollama installed, the next step is obtaining and running the DeepSeek models themselves.

Available DeepSeek Models

Ollama provides access to various DeepSeek models through its repository. The main variants include:

deepseek-coder – Specialized for programming tasks
deepseek-llm – General purpose language model
deepseek-visual – Multimodal model supporting images and text

Each model comes in different parameter sizes and quantization levels to accommodate various hardware configurations.

Choosing the Right Model Size

Select a model appropriate for your hardware:

For systems with 8GB RAM: deepseek-coder:1.5b-base-q4_0
For systems with 16GB RAM: deepseek-coder:7b-base-q4_K_S
For systems with 32GB RAM: deepseek-coder:13b-base-q5_K_M
For systems with 64GB+ RAM: deepseek-coder:33b-instruct-q5_K_M

The suffix indicates the quantization level, with lower numbers (q4) requiring less memory but potentially reducing quality compared to higher numbers (q5, q8).

Pulling Models with Ollama

Download your chosen model using the pull command:

ollama pull deepseek-coder:7b-base-q4_K_S

This process downloads and prepares the model for use. Depending on your internet connection and the model size, this may take several minutes to complete.

Initial Model Testing

Test your newly downloaded model with a simple query:

ollama run deepseek-coder:7b-base-q4_K_S "Explain how to create a simple HTTP server in Python"

The model should generate a response explaining the requested information. This confirms that both Ollama and the DeepSeek model are functioning correctly.

Model Management

List all downloaded models:

ollama list

Remove a model to free up disk space:

ollama rm deepseek-coder:7b-base-q4_K_S

Pulling an updated version:

ollama pull deepseek-coder:7b-base-q4_K_S

Proper model management helps maintain system performance and storage efficiency.

Setting Up Open WebUI Interface

While the command line offers direct access to DeepSeek, a web interface provides a more user-friendly experience similar to commercial AI chatbots.

What is Open WebUI?

Open WebUI is a browser-based graphical interface for interacting with locally hosted language models. It offers features like:

Persistent chat sessions
Conversation history storage
File uploads and attachments
Model switching between different LLMs
Visual customization options
User account management

This interface transforms DeepSeek from a command-line tool into a full-featured chatbot experience.

Installation Methods

Open WebUI can be installed either directly on your system or through Docker:

Direct installation offers deeper integration with your system
Docker provides easier maintenance and isolation

Both methods achieve the same end result with different maintenance implications.

Direct Installation Steps

Create a dedicated Python virtual environment for Open WebUI:

mkdir -p ~/open-webui
cd ~/open-webui
python3 -m venv venv
source venv/bin/activate

Installing Open WebUI Packages

Install the Open WebUI application:

pip install open-webui

This command installs the web interface and all its dependencies within the isolated virtual environment.

Running the Web Server

Launch the Open WebUI server:

open-webui start

The first launch will create configuration files and initialize the database. After a few moments, the server will be running.

Accessing the Interface

Open a web browser and navigate to:

http://localhost:8080

If accessing from another device on your network, replace “localhost” with your Debian server’s IP address.

User Account Creation

On first access, you’ll be prompted to create an administrator account:

Enter a username and password
Complete the setup wizard
Connect to your local Ollama instance (usually at http://localhost:11434)

Once configured, you can begin chatting with your DeepSeek model through the web interface.

Docker Installation Alternative

Docker provides a self-contained environment for running Open WebUI with minimal system modification.

Docker Advantages

Using Docker for Open WebUI offers several benefits:

Simplified installation and updates
Reduced dependency conflicts
Easy migration between systems
Consistent environment regardless of host configuration
Isolated resource management

These advantages make Docker particularly appealing for server environments or systems where multiple applications run.

Installing Docker on Debian 12

Install Docker using the official Debian repository:

sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable docker
sudo systemctl start docker

Add your user to the Docker group to avoid using sudo for every command:

sudo usermod -aG docker $USER

Log out and back in for this change to take effect.

Pulling Open WebUI Image

Pull the latest Open WebUI Docker image:

docker pull ghcr.io/open-webui/open-webui:main

Container Configuration

Create a directory for persistent storage:

mkdir -p ~/open-webui-data

Running the Container

Launch the Open WebUI container with the following command:

docker run -d \
  --name open-webui \
  -p 8080:8080 \
  -v ~/open-webui-data:/app/backend/data \
  -e OLLAMA_API_BASE_URL=http://host.docker.internal:11434 \
  --add-host host.docker.internal:host-gateway \
  ghcr.io/open-webui/open-webui:main

This command:

Creates a container named “open-webui”
Maps port 8080 to your host system
Creates a persistent volume for data storage
Configures connection to your Ollama instance

Accessing Docker Web Interface

Open your browser and navigate to:

http://localhost:8080

Follow the same initial setup process described in the direct installation method.

Using DeepSeek via Command Line

The command line offers the most direct way to interact with DeepSeek models.

Basic Interaction

Start an interactive chat session:

ollama run deepseek-coder:7b-base-q4_K_S

This launches a conversation where you can type prompts and receive responses. Press Ctrl+D or type “/exit” to end the session.

One-off Queries

For single queries without starting a persistent session:

ollama run deepseek-coder:7b-base-q4_K_S "Write a bash function that finds all files larger than 100MB"

This pattern works well for scripts or quick questions without the overhead of maintaining a session.

Scripting with DeepSeek

Create a simple shell script to automate interactions:

#!/bin/bash
prompt="$1"
model="${2:-deepseek-coder:7b-base-q4_K_S}"
echo "Asking $model: $prompt"
ollama run "$model" "$prompt"

Save this as `ask-ai.sh`, make it executable with `chmod +x ask-ai.sh`, and use it like:

./ask-ai.sh "Explain how TCP/IP works"

Command Line Parameters

Customize model behavior with additional parameters:

ollama run deepseek-coder:7b-base-q4_K_S \
  --temperature 0.7 \
  --top_p 0.9 \
  --context_length 4096

Higher temperature values (0-1) increase creativity, while lower values produce more deterministic responses.

Using DeepSeek via API

For developers, the API offers programmatic access to DeepSeek’s capabilities.

Starting the API Server

Ollama automatically runs an API server on port 11434. Verify it’s accessible:

curl http://localhost:11434/api/version

This should return a JSON response with the Ollama version.

API Endpoints

The main endpoints include:

/api/generate – Generate text from a prompt
/api/chat – Interactive chat completion
/api/embeddings – Generate vector embeddings
/api/models – List available models

Each endpoint accepts specific parameters detailed in the Ollama documentation.

cURL Examples

Generate a response from DeepSeek:

curl -X POST http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "deepseek-coder:7b-base-q4_K_S",
    "prompt": "Write a Python function to check if a string is a palindrome"
  }'

Python API Usage

Create a simple Python script to interact with DeepSeek:

import requests
import json

def ask_deepseek(prompt, model="deepseek-coder:7b-base-q4_K_S"):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": model, "prompt": prompt}
    )
    return response.json()["response"]

# Example usage
result = ask_deepseek("Explain quantum computing in simple terms")
print(result)

API Parameters

Customize responses with additional parameters:

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "deepseek-coder:7b-base-q4_K_S",
        "prompt": "Write a sorting algorithm",
        "temperature": 0.5,
        "top_p": 0.9,
        "max_tokens": 500
    }
)

These parameters control response generation characteristics like creativity, length, and determinism.

Performance Optimization

Optimizing your DeepSeek installation ensures the best possible experience on your hardware.

System Resource Management

Allocate appropriate resources to Ollama:

OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_THREADS=4 OLLAMA_GPU_LAYERS=43 ollama serve

These environment variables control:

OLLAMA_NUM_THREADS: CPU threads used for computation
OLLAMA_GPU_LAYERS: Number of model layers offloaded to GPU
OLLAMA_HOST: Network interface binding

Model Parameter Tweaking

Create a custom model configuration for optimized performance:

ollama create deepseek-optimized -f Modelfile

Where Modelfile contains:

FROM deepseek-coder:7b-base-q4_K_S
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 4096

This creates a custom model variant with your preferred default parameters.

Response Generation Settings

Different use cases require different generation settings:

Creative writing: Higher temperature (0.7-0.9)
Factual responses: Lower temperature (0.1-0.3)
Code generation: Medium temperature (0.3-0.6) with higher top_k values

Experiment to find the optimal balance for your specific needs.

Background Process Management

For server deployments, use systemd to manage Ollama and Open WebUI:

sudo systemctl enable --now ollama

Create a systemd service file for Open WebUI at /etc/systemd/system/open-webui.service:

[Unit]
Description=Open WebUI
After=network.target ollama.service

[Service]
User=your_username
WorkingDirectory=/home/your_username/open-webui
ExecStart=/home/your_username/open-webui/venv/bin/open-webui start
Restart=on-failure

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl enable --now open-webui

Using Swap Space

For systems with limited RAM, configure additional swap space:

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Add to /etc/fstab for persistence:

/swapfile swap swap defaults 0 0

While swap is slower than RAM, it allows running larger models on systems with limited physical memory.

Troubleshooting Common Issues

Even with careful installation, issues may arise that require troubleshooting.

Installation Failures

If Ollama installation fails:

Check system architecture compatibility
Verify internet connectivity
Ensure sufficient disk space
Examine logs with journalctl -u ollama

Common solution:

sudo apt install -y build-essential curl
curl -fsSL https://ollama.com/install.sh | sh

Model Download Problems

For model download failures:

Check network connectivity
Verify disk space availability
Try direct download via API:

curl -X POST http://localhost:11434/api/pull \
  -d '{"name": "deepseek-coder:7b-base-q4_K_S"}'

Memory-Related Errors

If experiencing out-of-memory errors:

Switch to a smaller model variant
Increase swap space
Adjust GPU layers with OLLAMA_GPU_LAYERS
Try a more aggressive quantization level (q4 instead of q5)

Slow Performance

For sluggish response generation:

Verify GPU utilization with nvidia-smi or rocm-smi
Check thermal throttling with sensors
Monitor system resources with htop
Consider reducing context length for faster responses

Web UI Connection Issues

If Open WebUI cannot connect to Ollama:

Verify Ollama is running: systemctl status ollama
Check connection settings in the WebUI configuration
Ensure the correct URL is specified (usually http://localhost:11434)
Check for firewall rules blocking communication

Model Loading Errors

For model corruption or loading failures:

Remove and redownload the model:

ollama rm deepseek-coder:7b-base-q4_K_S
ollama pull deepseek-coder:7b-base-q4_K_S

Check for disk errors: sudo fsck -f /dev/sdX
Verify system stability with a memory test: memtest86+

Advanced Configuration

Custom Model Parameters

Create a fine-tuned model configuration in a Modelfile:

FROM deepseek-coder:7b-base-q4_K_S
SYSTEM You are a helpful programming assistant specializing in Linux and Debian systems.
PARAMETER temperature 0.4
PARAMETER top_p 0.95
PARAMETER seed 42
TEMPLATE """
{{- if .System }}
SYSTEM: {{ .System }}
{{- end }}
USER: {{ .Prompt }}
ASSISTANT: 
"""

Build the custom model:

ollama create deepseek-debian-expert -f ./Modelfile

System Service Setup

Create an optimized systemd service for Ollama:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
Environment="OLLAMA_NUM_THREADS=4"
Environment="OLLAMA_GPU_LAYERS=43"
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3

[Install]
WantedBy=default.target

Save as /etc/systemd/system/ollama.service, then:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Security Considerations

For multi-user environments, implement basic authentication:

Configure Open WebUI with user accounts
Set up a reverse proxy with nginx or Apache
Implement HTTPS with Let’s Encrypt certificates
Restrict API access to trusted networks

Network Configuration

Allow remote access to your DeepSeek installation:

OLLAMA_HOST=0.0.0.0 ollama serve

Configure Open WebUI for remote access by changing the binding address:

open-webui start --host 0.0.0.0

Remember to configure appropriate firewall rules to protect your installation.

Congratulations! You have successfully installed DeepSeek. Thanks for using this tutorial for installing the DeepSeek AI model on Debian 12 “Bookworm” Linux. For additional help or useful information, we recommend you check the official DeepSeek website.

VPS Manage Service Offer

If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!