DebianDebian Based

How To Install DeepSeek on Debian 12

Install DeepSeek on Debian 12

Running powerful AI language models locally offers tremendous advantages for developers, researchers, and privacy-conscious users. DeepSeek represents one of the most capable open-source language models available today, and installing it on Debian 12 provides a stable, secure foundation for AI experimentation. This comprehensive guide walks through every aspect of getting DeepSeek up and running on your Debian system, from understanding the hardware requirements to advanced configuration options.

Table of Contents

Understanding DeepSeek AI

DeepSeek is an advanced large language model (LLM) designed to provide powerful natural language processing capabilities while running directly on your local hardware. Unlike cloud-based solutions that send your data to remote servers, DeepSeek can operate entirely offline, giving you complete control over your information and interactions.

What is DeepSeek?

DeepSeek represents a family of open-source language models created to provide accessible AI capabilities. These models can understand context, generate human-like text, assist with coding tasks, and answer complex questions based on their training data. The technology leverages transformer architecture, similar to models like LLaMA and Mistral, but with its own unique optimizations and capabilities.

Types of DeepSeek Models

DeepSeek comes in several variants to accommodate different hardware configurations:

  • DeepSeek 1.5B – Lightweight model suitable for systems with limited resources
  • DeepSeek 7B – Mid-range model balancing performance and resource requirements
  • DeepSeek 13B – Larger model with improved reasoning and capabilities
  • DeepSeek 70B – Most powerful variant with exceptional capabilities (requires significant hardware)

Each model offers progressively better performance at the cost of increased resource requirements. For most home or small business users, the 7B or 13B models provide an excellent balance of capability and practicality.

DeepSeek vs. Other Local LLMs

Compared to alternatives like LLaMA, Mistral, or Falcon, DeepSeek offers several competitive advantages:

  • Strong performance on coding and technical tasks
  • Efficient resource utilization for its capability level
  • Good balance of reasoning and knowledge capabilities
  • Active development and community support

While other models may excel in specific domains, DeepSeek provides a well-rounded experience that performs admirably across various use cases.

Why Run DeepSeek Locally?

The benefits of running DeepSeek on your own Debian 12 system include:

  • Complete privacy with no data leaving your system
  • No subscription costs or API usage limits
  • Full customization of model parameters and behavior
  • Consistent availability without internet dependency
  • Integration possibilities with local applications and workflows
  • Learning opportunities about AI system administration

Local deployment puts you in control of both the technology and your data, making it ideal for sensitive applications or continuous usage scenarios.

Hardware Prerequisites

Before attempting to install DeepSeek, understanding your hardware requirements is essential for a successful deployment.

System Requirements

The hardware needed depends heavily on which DeepSeek model you plan to run:

  • CPU: Modern x86-64 processor with AVX2 support (Intel 6th gen or newer, AMD Ryzen or newer)
  • RAM: Minimum 8GB for the 1.5B model, 16GB for 7B, 32GB for 13B, and 64GB+ for 70B
  • Storage: At least 10GB free space for the base system plus 2-40GB depending on model size
  • GPU: Optional but highly recommended for reasonable performance

The system requirements increase substantially with larger models. Without a GPU, expect significantly slower inference times, especially with models larger than 7B parameters.

CPU vs. GPU Performance

While DeepSeek can run on CPU-only systems, the performance difference with a GPU is substantial:

  • CPU-only: 0.5-2 tokens per second for larger models (practical for simple queries)
  • Entry-level GPU (4GB VRAM): 5-15 tokens per second
  • Mid-range GPU (8GB+ VRAM): 15-30+ tokens per second
  • High-end GPU (16GB+ VRAM): 30-60+ tokens per second

For interactive use, a discrete GPU dramatically improves the experience. NVIDIA GPUs tend to offer the best performance due to CUDA optimization, though AMD GPUs can also work with ROCm support on Debian 12.

Memory Requirements

RAM is the most critical resource for running larger language models. As a general guideline:

  • DeepSeek 1.5B: 6-8GB RAM
  • DeepSeek 7B: 12-16GB RAM
  • DeepSeek 13B: 24-32GB RAM
  • DeepSeek 70B: 64GB+ RAM

These requirements can be reduced somewhat through quantization techniques, but this may affect model quality.

Storage Space Needed

Ensure sufficient disk space is available:

  • Base Debian 12 installation: 8-10GB
  • DeepSeek 1.5B model: 1-2GB
  • DeepSeek 7B model: 4-7GB
  • DeepSeek 13B model: 8-13GB
  • DeepSeek 70B model: 35-40GB

SSD storage is strongly recommended for faster model loading and better overall system responsiveness.

Hardware Compatibility on Debian 12

Debian 12 (Bookworm) has excellent hardware compatibility, particularly for:

  • AMD and Intel processors from the last decade
  • NVIDIA GPUs with community or proprietary drivers
  • AMD GPUs with open-source drivers
  • Most standard server and desktop hardware components

For GPU acceleration, ensure your graphics drivers are properly installed before proceeding with DeepSeek installation.

Software Prerequisites

A properly configured Debian 12 system forms the foundation for running DeepSeek effectively.

Debian 12 Preparation

Start with a clean, updated Debian 12 installation:

sudo apt update
sudo apt upgrade -y

Ensure your system locale is properly configured to avoid text encoding issues:

sudo dpkg-reconfigure locales

Required Packages

Install essential dependencies for running DeepSeek and its supporting software:

sudo apt install -y build-essential python3 python3-pip python3-venv python3-dev git curl wget cmake jq htop

For GPU support on NVIDIA systems, install the appropriate drivers and CUDA:

sudo apt install -y nvidia-driver nvidia-cuda-toolkit

For AMD GPUs, install ROCm support packages:

sudo apt install -y rocm-opencl-runtime

Package Manager Updates

Ensure pip and related tools are up-to-date:

pip3 install --upgrade pip setuptools wheel

Terminal Access

Most operations require terminal access. If using a desktop environment, launch the terminal application. For headless servers, connect via SSH with:

ssh username@debian-server-ip

A properly configured terminal environment with bash or zsh provides the best experience for installing and managing DeepSeek.

Installation Methods Overview

Several installation approaches exist for running DeepSeek on Debian 12, each with distinct advantages.

Ollama Method

Ollama provides the simplest approach for running various LLMs including DeepSeek. This method offers:

  • Quick, straightforward installation process
  • Easy model management and switching
  • Simplified API and command-line interface
  • Lower technical knowledge requirements

For most users, particularly beginners, the Ollama approach represents the recommended path.

vLLM Approach

vLLM offers enhanced performance with more advanced configuration options:

  • Better throughput and latency optimizations
  • More granular control over execution parameters
  • Advanced memory management for larger models
  • Better support for professional applications

This method requires more technical knowledge but delivers superior performance.

Docker Installation

Docker provides isolated, containerized deployment:

  • Clean separation from the host system
  • Easier version management and updates
  • Consistent environment across different systems
  • Simplified dependency management

Docker combines ease of deployment with flexibility, making it an excellent choice for many users.

Method Comparison

  • Simplicity: Ollama > Docker > vLLM
  • Performance: vLLM > Ollama > Docker
  • Flexibility: vLLM > Docker > Ollama
  • Ease of maintenance: Docker > Ollama > vLLM

The remainder of this guide focuses primarily on the Ollama method, with notes on Docker alternatives where applicable.

Installing Ollama on Debian 12

Ollama serves as a convenient wrapper around various large language models, making them easier to deploy and manage.

What is Ollama?

Ollama is an open-source tool that simplifies running, managing, and interacting with various language models locally. It handles downloading, loading, and serving models through a straightforward API and command-line interface. Essentially, Ollama does the heavy lifting of model management so you can focus on using the AI capabilities.

Getting Ollama

Download the Ollama installation script with:

curl -fsSL https://ollama.com/install.sh > install-ollama.sh

Before running any downloaded script, it’s good practice to review its contents:

less install-ollama.sh

Installation Process

Once satisfied with the script contents, make it executable and run it:

chmod +x install-ollama.sh
sudo ./install-ollama.sh

The installation process will:

  1. Download the appropriate Ollama binary for your system
  2. Install it to /usr/local/bin
  3. Create necessary configuration directories
  4. Set up a systemd service for automatic startup

Verification

Verify that Ollama installed correctly:

ollama --version

If installation was successful, you’ll see the version number displayed.

Starting Ollama Service

Enable and start the Ollama service to run automatically at boot:

sudo systemctl enable ollama
sudo systemctl start ollama

Check the service status to ensure it’s running properly:

sudo systemctl status ollama

You should see “active (running)” in the output, indicating Ollama is operational.

Downloading and Running DeepSeek Models

With Ollama installed, the next step is obtaining and running the DeepSeek models themselves.

Available DeepSeek Models

Ollama provides access to various DeepSeek models through its repository. The main variants include:

  • deepseek-coder – Specialized for programming tasks
  • deepseek-llm – General purpose language model
  • deepseek-visual – Multimodal model supporting images and text

Each model comes in different parameter sizes and quantization levels to accommodate various hardware configurations.

Choosing the Right Model Size

Select a model appropriate for your hardware:

  • For systems with 8GB RAM: deepseek-coder:1.5b-base-q4_0
  • For systems with 16GB RAM: deepseek-coder:7b-base-q4_K_S
  • For systems with 32GB RAM: deepseek-coder:13b-base-q5_K_M
  • For systems with 64GB+ RAM: deepseek-coder:33b-instruct-q5_K_M

The suffix indicates the quantization level, with lower numbers (q4) requiring less memory but potentially reducing quality compared to higher numbers (q5, q8).

Pulling Models with Ollama

Download your chosen model using the pull command:

ollama pull deepseek-coder:7b-base-q4_K_S

This process downloads and prepares the model for use. Depending on your internet connection and the model size, this may take several minutes to complete.

Initial Model Testing

Test your newly downloaded model with a simple query:

ollama run deepseek-coder:7b-base-q4_K_S "Explain how to create a simple HTTP server in Python"

The model should generate a response explaining the requested information. This confirms that both Ollama and the DeepSeek model are functioning correctly.

Model Management

List all downloaded models:

ollama list

Remove a model to free up disk space:

ollama rm deepseek-coder:7b-base-q4_K_S

Pulling an updated version:

ollama pull deepseek-coder:7b-base-q4_K_S

Proper model management helps maintain system performance and storage efficiency.

Setting Up Open WebUI Interface

While the command line offers direct access to DeepSeek, a web interface provides a more user-friendly experience similar to commercial AI chatbots.

What is Open WebUI?

Open WebUI is a browser-based graphical interface for interacting with locally hosted language models. It offers features like:

  • Persistent chat sessions
  • Conversation history storage
  • File uploads and attachments
  • Model switching between different LLMs
  • Visual customization options
  • User account management

This interface transforms DeepSeek from a command-line tool into a full-featured chatbot experience.

Installation Methods

Open WebUI can be installed either directly on your system or through Docker:

  • Direct installation offers deeper integration with your system
  • Docker provides easier maintenance and isolation

Both methods achieve the same end result with different maintenance implications.

Direct Installation Steps

Create a dedicated Python virtual environment for Open WebUI:

mkdir -p ~/open-webui
cd ~/open-webui
python3 -m venv venv
source venv/bin/activate

Installing Open WebUI Packages

Install the Open WebUI application:

pip install open-webui

This command installs the web interface and all its dependencies within the isolated virtual environment.

Running the Web Server

Launch the Open WebUI server:

open-webui start

The first launch will create configuration files and initialize the database. After a few moments, the server will be running.

Accessing the Interface

Open a web browser and navigate to:

http://localhost:8080

If accessing from another device on your network, replace “localhost” with your Debian server’s IP address.

User Account Creation

On first access, you’ll be prompted to create an administrator account:

  1. Enter a username and password
  2. Complete the setup wizard
  3. Connect to your local Ollama instance (usually at http://localhost:11434)

Once configured, you can begin chatting with your DeepSeek model through the web interface.

Docker Installation Alternative

Docker provides a self-contained environment for running Open WebUI with minimal system modification.

Docker Advantages

Using Docker for Open WebUI offers several benefits:

  • Simplified installation and updates
  • Reduced dependency conflicts
  • Easy migration between systems
  • Consistent environment regardless of host configuration
  • Isolated resource management

These advantages make Docker particularly appealing for server environments or systems where multiple applications run.

Installing Docker on Debian 12

Install Docker using the official Debian repository:

sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable docker
sudo systemctl start docker

Add your user to the Docker group to avoid using sudo for every command:

sudo usermod -aG docker $USER

Log out and back in for this change to take effect.

Pulling Open WebUI Image

Pull the latest Open WebUI Docker image:

docker pull ghcr.io/open-webui/open-webui:main

Container Configuration

Create a directory for persistent storage:

mkdir -p ~/open-webui-data

Running the Container

Launch the Open WebUI container with the following command:

docker run -d \
  --name open-webui \
  -p 8080:8080 \
  -v ~/open-webui-data:/app/backend/data \
  -e OLLAMA_API_BASE_URL=http://host.docker.internal:11434 \
  --add-host host.docker.internal:host-gateway \
  ghcr.io/open-webui/open-webui:main

This command:

  1. Creates a container named “open-webui”
  2. Maps port 8080 to your host system
  3. Creates a persistent volume for data storage
  4. Configures connection to your Ollama instance

Accessing Docker Web Interface

Open your browser and navigate to:

http://localhost:8080

Follow the same initial setup process described in the direct installation method.

Using DeepSeek via Command Line

The command line offers the most direct way to interact with DeepSeek models.

Basic Interaction

Start an interactive chat session:

ollama run deepseek-coder:7b-base-q4_K_S

This launches a conversation where you can type prompts and receive responses. Press Ctrl+D or type “/exit” to end the session.

One-off Queries

For single queries without starting a persistent session:

ollama run deepseek-coder:7b-base-q4_K_S "Write a bash function that finds all files larger than 100MB"

This pattern works well for scripts or quick questions without the overhead of maintaining a session.

Scripting with DeepSeek

Create a simple shell script to automate interactions:

#!/bin/bash
prompt="$1"
model="${2:-deepseek-coder:7b-base-q4_K_S}"
echo "Asking $model: $prompt"
ollama run "$model" "$prompt"

Save this as `ask-ai.sh`, make it executable with `chmod +x ask-ai.sh`, and use it like:

./ask-ai.sh "Explain how TCP/IP works"

Command Line Parameters

Customize model behavior with additional parameters:

ollama run deepseek-coder:7b-base-q4_K_S \
  --temperature 0.7 \
  --top_p 0.9 \
  --context_length 4096

Higher temperature values (0-1) increase creativity, while lower values produce more deterministic responses.

Using DeepSeek via API

For developers, the API offers programmatic access to DeepSeek’s capabilities.

Starting the API Server

Ollama automatically runs an API server on port 11434. Verify it’s accessible:

curl http://localhost:11434/api/version

This should return a JSON response with the Ollama version.

API Endpoints

The main endpoints include:

  • /api/generate – Generate text from a prompt
  • /api/chat – Interactive chat completion
  • /api/embeddings – Generate vector embeddings
  • /api/models – List available models

Each endpoint accepts specific parameters detailed in the Ollama documentation.

cURL Examples

Generate a response from DeepSeek:

curl -X POST http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "deepseek-coder:7b-base-q4_K_S",
    "prompt": "Write a Python function to check if a string is a palindrome"
  }'

Python API Usage

Create a simple Python script to interact with DeepSeek:

import requests
import json

def ask_deepseek(prompt, model="deepseek-coder:7b-base-q4_K_S"):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": model, "prompt": prompt}
    )
    return response.json()["response"]

# Example usage
result = ask_deepseek("Explain quantum computing in simple terms")
print(result)

API Parameters

Customize responses with additional parameters:

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "deepseek-coder:7b-base-q4_K_S",
        "prompt": "Write a sorting algorithm",
        "temperature": 0.5,
        "top_p": 0.9,
        "max_tokens": 500
    }
)

These parameters control response generation characteristics like creativity, length, and determinism.

Performance Optimization

Optimizing your DeepSeek installation ensures the best possible experience on your hardware.

System Resource Management

Allocate appropriate resources to Ollama:

OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_THREADS=4 OLLAMA_GPU_LAYERS=43 ollama serve

These environment variables control:

  • OLLAMA_NUM_THREADS: CPU threads used for computation
  • OLLAMA_GPU_LAYERS: Number of model layers offloaded to GPU
  • OLLAMA_HOST: Network interface binding

Model Parameter Tweaking

Create a custom model configuration for optimized performance:

ollama create deepseek-optimized -f Modelfile

Where Modelfile contains:

FROM deepseek-coder:7b-base-q4_K_S
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 4096

This creates a custom model variant with your preferred default parameters.

Response Generation Settings

Different use cases require different generation settings:

  • Creative writing: Higher temperature (0.7-0.9)
  • Factual responses: Lower temperature (0.1-0.3)
  • Code generation: Medium temperature (0.3-0.6) with higher top_k values

Experiment to find the optimal balance for your specific needs.

Background Process Management

For server deployments, use systemd to manage Ollama and Open WebUI:

sudo systemctl enable --now ollama

Create a systemd service file for Open WebUI at /etc/systemd/system/open-webui.service:

[Unit]
Description=Open WebUI
After=network.target ollama.service

[Service]
User=your_username
WorkingDirectory=/home/your_username/open-webui
ExecStart=/home/your_username/open-webui/venv/bin/open-webui start
Restart=on-failure

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl enable --now open-webui

Using Swap Space

For systems with limited RAM, configure additional swap space:

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Add to /etc/fstab for persistence:

/swapfile swap swap defaults 0 0

While swap is slower than RAM, it allows running larger models on systems with limited physical memory.

Troubleshooting Common Issues

Even with careful installation, issues may arise that require troubleshooting.

Installation Failures

If Ollama installation fails:

  1. Check system architecture compatibility
  2. Verify internet connectivity
  3. Ensure sufficient disk space
  4. Examine logs with journalctl -u ollama

Common solution:

sudo apt install -y build-essential curl
curl -fsSL https://ollama.com/install.sh | sh

Model Download Problems

For model download failures:

  1. Check network connectivity
  2. Verify disk space availability
  3. Try direct download via API:
curl -X POST http://localhost:11434/api/pull \
  -d '{"name": "deepseek-coder:7b-base-q4_K_S"}'

Memory-Related Errors

If experiencing out-of-memory errors:

  1. Switch to a smaller model variant
  2. Increase swap space
  3. Adjust GPU layers with OLLAMA_GPU_LAYERS
  4. Try a more aggressive quantization level (q4 instead of q5)

Slow Performance

For sluggish response generation:

  1. Verify GPU utilization with nvidia-smi or rocm-smi
  2. Check thermal throttling with sensors
  3. Monitor system resources with htop
  4. Consider reducing context length for faster responses

Web UI Connection Issues

If Open WebUI cannot connect to Ollama:

  1. Verify Ollama is running: systemctl status ollama
  2. Check connection settings in the WebUI configuration
  3. Ensure the correct URL is specified (usually http://localhost:11434)
  4. Check for firewall rules blocking communication

Model Loading Errors

For model corruption or loading failures:

  1. Remove and redownload the model:
ollama rm deepseek-coder:7b-base-q4_K_S
ollama pull deepseek-coder:7b-base-q4_K_S
  1. Check for disk errors: sudo fsck -f /dev/sdX
  2. Verify system stability with a memory test: memtest86+

Advanced Configuration

Custom Model Parameters

Create a fine-tuned model configuration in a Modelfile:

FROM deepseek-coder:7b-base-q4_K_S
SYSTEM You are a helpful programming assistant specializing in Linux and Debian systems.
PARAMETER temperature 0.4
PARAMETER top_p 0.95
PARAMETER seed 42
TEMPLATE """
{{- if .System }}
SYSTEM: {{ .System }}
{{- end }}
USER: {{ .Prompt }}
ASSISTANT: 
"""

Build the custom model:

ollama create deepseek-debian-expert -f ./Modelfile

System Service Setup

Create an optimized systemd service for Ollama:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
Environment="OLLAMA_NUM_THREADS=4"
Environment="OLLAMA_GPU_LAYERS=43"
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3

[Install]
WantedBy=default.target

Save as /etc/systemd/system/ollama.service, then:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Security Considerations

For multi-user environments, implement basic authentication:

  1. Configure Open WebUI with user accounts
  2. Set up a reverse proxy with nginx or Apache
  3. Implement HTTPS with Let’s Encrypt certificates
  4. Restrict API access to trusted networks

Network Configuration

Allow remote access to your DeepSeek installation:

OLLAMA_HOST=0.0.0.0 ollama serve

Configure Open WebUI for remote access by changing the binding address:

open-webui start --host 0.0.0.0

Remember to configure appropriate firewall rules to protect your installation.

Congratulations! You have successfully installed DeepSeek. Thanks for using this tutorial for installing the DeepSeek AI model on Debian 12 “Bookworm” Linux. For additional help or useful information, we recommend you check the official DeepSeek website.

VPS Manage Service Offer
If you don’t have time to do all of this stuff, or if this is not your area of expertise, we offer a service to do “VPS Manage Service Offer”, starting from $10 (Paypal payment). Please contact us to get the best deal!

r00t

r00t is an experienced Linux enthusiast and technical writer with a passion for open-source software. With years of hands-on experience in various Linux distributions, r00t has developed a deep understanding of the Linux ecosystem and its powerful tools. He holds certifications in SCE and has contributed to several open-source projects. r00t is dedicated to sharing her knowledge and expertise through well-researched and informative articles, helping others navigate the world of Linux with confidence.
Back to top button