Home/Docs/Operations Runbook
Full Docs Article

DeepSeek TUI Operations Runbook Guide

Operations runbook docs describe repeatable operational checks, failure handling, and expected operator responses for day-two maintenance.

Upstream source: OPERATIONS_RUNBOOK.mdReadable on-siteFull article embedded below

What this document actually covers

Operations runbook docs describe repeatable operational checks, failure handling, and expected operator responses for day-two maintenance.

How to use this page

  • Use the section list on the right to jump into the exact upstream section you need.
  • If you came here for one problem only, read the relevant heading first and return to the matching hub afterward.
  • If you need source verification, open the GitHub page or raw Markdown directly.

Full upstream document

DeepSeek TUI Operations Runbook

This runbook covers practical debugging and incident response for the local CLI/TUI runtime.

Quick Triage

  1. Confirm binary + config: - cargo run -- --version - cat ~/.deepseek/config.toml (or inspect configured profile)
  2. Enable verbose logs: - RUST_LOG=deepseek_cli=debug cargo run - For HTTP retries/reconnects: RUST_LOG=deepseek_cli::client=debug cargo run
  3. Capture current state: - ls ~/.deepseek/sessions - ls ~/.deepseek/sessions/checkpoints - ls ~/.deepseek/tasks

Incident: Turn Hangs or Stream Stops

Symptoms:

  • TUI remains in loading state
  • partial assistant output with no completion

Checks:

  1. Inspect retry/health logs (deepseek_cli::client)
  2. Verify endpoint connectivity: - curl -sS https://api.deepseek.com/v1/models -H "Authorization: Bearer $DEEPSEEK_API_KEY"
  3. Confirm no local sandbox/permission deadlock in tool output

Actions:

  1. If a foreground shell command is running, press Ctrl+B and choose whether to background it or cancel the current turn.
  2. If the command was started in the background, ask the assistant to cancel it with exec_shell_cancel and the returned task id.
  3. Use Esc or Ctrl+C to interrupt the current turn when you want to stop the request itself.
  4. Retry prompt; if still failing, restart TUI.
  5. On restart, verify the previous queued/in-flight runtime turn is shown as interrupted rather than left in a running state.

Incident: Network Outage / Offline Behavior

Expected behavior:

  • New prompts are queued while offline mode is active
  • Queue state persists to ~/.deepseek/sessions/checkpoints/offline_queue.json

Checks:

  1. Open queue in TUI: /queue list
  2. Confirm persisted queue file exists and updates timestamp

Actions:

  1. Restore connectivity
  2. Re-send queued entries (from /queue edit <n> + Enter, or normal input flow)
  3. Ensure queue file clears when queue is empty

Incident: Crash Recovery Needed

Expected behavior:

  • Checkpoint stored at ~/.deepseek/sessions/checkpoints/latest.json
  • Startup begins a fresh session unless --resume/--continue is supplied

Actions:

  1. Resume prior work explicitly via deepseek --resume <id> or Ctrl+R in TUI
  2. If checkpoint inspection is needed, inspect latest.json for schema mismatch/details
  3. If schema is newer than binary supports, upgrade binary or remove stale checkpoint

Incident: Persistent State Schema Errors

Symptoms:

  • Errors like schema vX is newer than supported vY

Affected stores:

  • sessions (~/.deepseek/sessions/*.json)
  • runtime thread/turn/item records
  • tasks (~/.deepseek/tasks/tasks/*.json)

Actions:

  1. Confirm binary version and migration expectations
  2. Back up the state directory before editing
  3. Either: - run with a newer compatible binary, or - archive incompatible records and regenerate state

Incident: MCP/Tool Execution Failures

Checks:

  1. Validate ~/.deepseek/mcp.json schema and server command paths
  2. Confirm server process can start manually
  3. Check sandbox denials in TUI history / logs

Actions:

  1. Retry with required approvals (or YOLO only when appropriate)
  2. Temporarily disable failing MCP server and isolate issue
  3. Re-enable after verification with /mcp diagnostics

Post-Incident Checklist

  1. Preserve logs and relevant state files
  2. Record trigger, impact, and mitigation
  3. Add or update regression tests (retry/recovery/schema)
  4. Update this runbook and architecture docs if behavior changed

Back to top