Skip to content

Unsloth Studio — Architecture (C4 + OOP UML)

Repo: unsloth (cloned). Two intertwined products live here:

  1. Unsloth Core — a Python library (unsloth/) that patches transformers / trl / peft at import-time to make LLM fine-tuning ~2× faster with up to 70% less VRAM.
  2. Unsloth Studio — a desktop/web app (studio/) that wraps Unsloth Core behind a FastAPI backend, a React UI, and a Tauri native shell. This is the “Unsloth Web UI” the user is interested in.

This report follows Simon Brown’s C4 model (Context → Containers → Components), then adds a Class-level UML diagram for the most architecturally significant Python classes. All diagrams use Mermaid.


unsloth/ ← repo root
├── cli.py ← thin shim → unsloth_cli.app
├── unsloth_cli/ ← Typer CLI (train/inference/export/studio)
│ └── commands/
│ ├── train.py
│ ├── inference.py
│ ├── export.py
│ └── studio.py ← `unsloth studio …` subcommands
├── unsloth/ ← Unsloth Core (Python library)
│ ├── models/ ← FastLlamaModel, FastQwen3Model, …
│ ├── kernels/ ← Triton kernels (rope, rms_norm, …)
│ ├── trainer.py ← UnslothTrainer (extends SFTTrainer)
│ ├── dataprep/, registry/, optimizers/, utils/
│ ├── save.py, chat_templates.py, tokenizer_utils.py
│ └── _auto_install.py
├── studio/ ← Unsloth Studio (full-stack app)
│ ├── backend/ ← FastAPI server (Python)
│ │ ├── main.py, run.py
│ │ ├── routes/ ← HTTP adapters
│ │ ├── core/ ← Domain orchestration (training/inference/export/data_recipe)
│ │ ├── models/ ← Pydantic DTOs
│ │ ├── auth/ ← JWT + API key + bootstrap admin
│ │ ├── storage/studio_db.py ← SQLite (WAL) for training history
│ │ ├── utils/ ← hardware, datasets, paths, …
│ │ └── plugins/ ← seed plugins for data-designer
│ ├── frontend/ ← React 19 + Vite + TanStack Router
│ │ └── src/
│ │ ├── app/ ← router, provider
│ │ ├── features/{auth,chat,training,data-recipes,export,…}
│ │ ├── stores/ ← Zustand
│ │ ├── components/ ← shadcn/Radix + assistant-ui
│ │ └── hooks/, lib/, shared/
│ └── src-tauri/ ← Rust desktop shell (Tauri 2)
│ └── src/{main.rs, process.rs, install.rs, update.rs, …}
├── tests/ ← pytest suites
└── scripts/ ← housekeeping (formatters, install helpers)

Architectural read: The codebase is a layered, subprocess-isolated, hexagonal-ish system. The Python backend acts as the application core; routes are driving adapters (HTTP), and core/{training,inference,export,data_recipe} orchestrators are driven adapters that talk to long-lived subprocesses where the heavy ML lives. The frontend and the Tauri shell are independent UIs over the same FastAPI surface.


A radiologist or ML engineer sits in front of Unsloth Studio. They want to download a model, fine-tune it on their own dataset, chat with it, and export it — all without leaving the app. The system has to talk to the local GPU, Hugging Face Hub, and (optionally) a llama.cpp inference server.

flowchart TB
    user(["End User<br/>(ML engineer / researcher / radiologist)"])
    apiUser(["External API Client<br/>(Open WebUI, SillyTavern, scripts)"])

    subgraph studio["Unsloth Studio System"]
        direction TB
        sys[("Unsloth Studio<br/>desktop + web app<br/>train · chat · export · data-recipe")]
    end

    hf[("Hugging Face Hub<br/>model + dataset registry")]
    llamacpp[("llama.cpp / llama-server<br/>GGUF inference engine")]
    gpu[("Local Hardware<br/>NVIDIA / AMD / Apple Silicon GPU + CPU + RAM")]
    fs[("Local Filesystem<br/>~/.unsloth/studio/<br/>checkpoints, datasets, SQLite DB")]
    pypi[("PyPI / GitHub<br/>updates · unsloth-zoo · transformers")]

    user -- "uses (GUI: Tauri webview or browser)" --> studio
    apiUser -- "OpenAI-compat HTTP<br/>(/v1/chat/completions)" --> studio

    studio -- "download / upload<br/>models, datasets, adapters" --> hf
    studio -- "spawns + RPCs over stdio" --> llamacpp
    studio -- "CUDA / ROCm / MPS<br/>via PyTorch" --> gpu
    studio -- "reads / writes" --> fs
    studio -- "self-update + dep install" --> pypi

    classDef person fill:#08427b,stroke:#052e56,color:#fff
    classDef system fill:#1168bd,stroke:#0b4884,color:#fff
    classDef external fill:#999,stroke:#666,color:#fff
    class user,apiUser person
    class sys system
    class hf,llamacpp,gpu,fs,pypi external
Actor / SystemRole
End UserInteracts via the Tauri desktop app or the browser-served React UI.
External API ClientAny tool that speaks the OpenAI HTTP schema; Studio mounts its inference router at /v1 so they “just work”. See studio/backend/main.py:215-220.
Hugging Face HubSource of truth for base models, LoRA adapters, GGUFs, and datasets. Pulled via huggingface_hub and exposed in the UI as a search/download flow.
llama.cpp / llama-serverSpawned as a child process for GGUF inference (see core/inference/llama_cpp.py:LlamaCppBackend). Runs in its own subprocess so its lifecycle is decoupled from the Python Transformers backend.
Local hardwareDetected at startup by utils/hardware/ — sets a DEVICE global that flows through the whole system. Determines whether Studio runs in CHAT_ONLY mode (e.g. CPU/macOS without MLX) or full-training mode.
Local filesystemStudio writes everything user-related under ~/.unsloth/studio/ (PID file, bootstrap password, SQLite DB).
PyPI / GitHubSource for self-update and dependency installs (src-tauri/src/install.rs, update.rs, plus unsloth_cli/commands/studio.py).

A “container” here is a separately deployable / runnable process. Studio has four runtime containers plus the on-disk database.

flowchart TB
    user(["End User"])
    apiUser(["OpenAI-compat HTTP client"])

    subgraph desktop["Desktop machine"]
        direction TB

        tauri["<b>Tauri Desktop Shell</b><br/>[Rust binary, src-tauri/]<br/>Spawns + supervises backend,<br/>installs deps, handles updates,<br/>tray + custom titlebar"]

        fe["<b>React Frontend SPA</b><br/>[TypeScript · React 19 · Vite ·<br/>TanStack Router · Zustand · shadcn]<br/>Served from disk by Tauri webview<br/>OR by FastAPI static mount"]

        be["<b>FastAPI Backend</b><br/>[Python · uvicorn · structlog]<br/>Routes: /api/{auth,train,inference,<br/>models,datasets,export,data-recipe} + /v1<br/>Lifespan: hardware detect, admin seed"]

        subgraph workers["Subprocess Workers (mp.spawn)"]
            direction LR
            trainW["Training Worker<br/>core/training/worker.py<br/>UnslothTrainer + SFT/DPO/GRPO"]
            infW["Inference Worker<br/>core/inference/worker.py<br/>HF Transformers · Unsloth patches"]
            expW["Export Worker<br/>core/export/worker.py<br/>save_pretrained · GGUF convert"]
            llamaW["llama-server<br/>(C++ binary, OS subprocess)<br/>GGUF inference"]
        end

        db[("SQLite DB<br/>~/.unsloth/studio/studio.db<br/>WAL · users · runs · metrics")]
        fsstore[("Local FS<br/>~/.unsloth/studio/<br/>checkpoints, datasets, logs")]

        unsloth_lib[/"<b>Unsloth Core lib</b><br/>[Python pkg, unsloth/]<br/>imported INSIDE workers only"/]
    end

    hf[("HuggingFace Hub")]

    user -- "WebView2 / WKWebView" --> tauri
    user -- "or http://localhost:8888 in browser" --> be

    tauri -- "spawns + monitors stdout" --> be
    tauri -- "loads bundled SPA" --> fe
    fe -- "fetch / SSE / WebSocket<br/>Bearer JWT or API key" --> be
    apiUser -- "/v1/chat/completions" --> be

    be -- "mp.Queue commands + events<br/>(spawn ctx)" --> trainW
    be -- "mp.Queue + cancel Event" --> infW
    be -- "mp.Queue" --> expW
    be -- "stdio JSON-RPC / HTTP" --> llamaW

    trainW -. "import" .-> unsloth_lib
    infW -. "import" .-> unsloth_lib
    expW -. "import" .-> unsloth_lib

    be -- "sqlite3 (WAL)" --> db
    trainW -- "writes metrics/checkpoints" --> fsstore
    infW -- "downloads + caches" --> fsstore
    be -- "reads/writes" --> fsstore

    trainW -- "model + dataset I/O" --> hf
    infW -- "model I/O" --> hf
    expW -- "upload (optional)" --> hf

    classDef container fill:#1168bd,stroke:#0b4884,color:#fff
    classDef worker fill:#3b8ed0,stroke:#1168bd,color:#fff
    classDef external fill:#999,stroke:#666,color:#fff
    classDef store fill:#85bb65,stroke:#5a8444,color:#fff
    class tauri,fe,be container
    class trainW,infW,expW,llamaW worker
    class hf external
    class db,fsstore,unsloth_lib store
ContainerTechResponsibilityKey files
Tauri Desktop ShellRust 2024-edition + Tauri 2Boots the desktop window; supervises a Python backend child process; performs first-run install (Python venv, llama-cpp prebuilt, etc.); handles auto-updates; system tray. Sits between the user and the backend. Also implements desktop auto-auth by sharing a generated secret with the backend so the webview can skip the login screen.studio/src-tauri/src/{main.rs, process.rs, install.rs, update.rs, desktop_auth.rs, preflight.rs}
React Frontend SPAReact 19 + Vite + TS strict + TanStack Router + Zustand + shadcn/Radix + Tailwind 4 + assistant-ui (chat) + xyflow (data-recipe nodes)Five top-level features: auth, chat, training, data-recipes, export. Each feature has its own api/ (typed fetch client), stores/ (Zustand), hooks/, components/. State is mostly per-feature local stores; only training has a global store at src/stores/training.ts.studio/frontend/src/{app, features, stores, components}
FastAPI BackendPython 3.10+, uvicorn, FastAPI, structlog, pydantic v2The orchestration core. Boots in main.py via a lifespan context manager that detects hardware, cleans stale compiled cache, seeds the default admin, and pre-caches a helper GGUF in a daemon thread. Routes are mounted under /api/* (and inference_router is also mounted at /v1 for OpenAI compatibility).studio/backend/{main.py, run.py, routes/, core/, models/, auth/, storage/, utils/}
Training / Inference / Export WorkersPython subprocesses spawned with mp.get_context("spawn")Run the heavy ML code (transformers, unsloth, peft, trl). Communicate with the parent via mp.Queue for events and a mp.Event for cancellation. Spawned fresh per training job but persistent across inference requests (with respawn on transformers major-version switch).studio/backend/core/{training,inference,export}/worker.py
llama-server (subprocess)C++ (external llama.cpp)Backs GGUF inference. Spawned and supervised by LlamaCppBackend.studio/backend/core/inference/llama_cpp.py
SQLite DBsqlite3 stdlib, WAL journalTwo domains in one file: auth (users, refresh tokens, API keys, JWT secrets) and studio (training runs, per-step metrics, scan folders). Schemas are created lazily by _ensure_schema() under a process-wide lock.studio/backend/storage/studio_db.py, studio/backend/auth/storage.py
Unsloth Core (unsloth/ Python pkg)Pure Python libraryPatches transformers/trl/peft at import time, exposes FastLanguageModel.from_pretrained(...), ships the Triton kernels, and provides UnslothTrainer. Imported only inside workers, never in the parent backend process.unsloth/{models, kernels, trainer.py, save.py, …}

core/training/training.py:5-15 and core/inference/orchestrator.py:5-15 both spell it out: PyTorch + transformers + unsloth’s monkey-patches are essentially un-unloadable from a Python interpreter. To run a Qwen model that needs transformers==4.57 and then a GLM model that needs transformers==5.x, the only workable answer is kill the worker, spawn a new one — even from the same parent process. The _CTX = mp.get_context("spawn") pattern (vs. the default fork on Linux) ensures the child starts from a clean interpreter and re-imports everything.


3. C4 Level 3 — Components (FastAPI Backend)

Section titled “3. C4 Level 3 — Components (FastAPI Backend)”

This zooms inside the FastAPI Backend container. The backend follows a clear three-layer split:

Routes (HTTP adapters)
│ call into
Core orchestrators (parent-process logic + subprocess RPC)
│ RPC over mp.Queue
Workers (run inside spawned subprocesses, import unsloth/transformers)

Cross-cutting: auth/ (JWT bearer + API key middleware), storage/ (SQLite), models/ (Pydantic DTOs), utils/hardware/ (the device detector that sets DEVICE and CHAT_ONLY globals consumed everywhere).

flowchart TB
    fe(["React SPA / external client"])

    subgraph backend["FastAPI Backend"]
        direction TB

        subgraph mw["Cross-cutting (FastAPI middleware + deps)"]
            direction LR
            cors["CORSMiddleware"]
            logmw["LoggingMiddleware<br/>(structlog request IDs)"]
            authdep["get_current_subject<br/>(HTTPBearer JWT/API-key)"]
        end

        subgraph routesL["Routes layer (HTTP adapters)"]
            direction LR
            r_auth["routes/auth.py<br/>POST /api/auth/login<br/>POST /refresh<br/>POST /change-password"]
            r_train["routes/training.py<br/>+ training_history.py<br/>POST /api/train/start /stop<br/>GET /events (SSE)<br/>GET /history"]
            r_inf["routes/inference.py<br/>POST /generate (SSE)<br/>POST /load /unload<br/>+ mounted as /v1 (OpenAI)"]
            r_models["routes/models.py<br/>GET /api/models"]
            r_data["routes/data_recipe<br/>+ datasets.py"]
            r_exp["routes/export.py"]
        end

        subgraph coreL["Core layer (orchestrators)"]
            direction LR
            o_train["core/training/<br/>TrainingBackend<br/>TrainingProgress"]
            o_inf["core/inference/<br/>InferenceOrchestrator<br/>LlamaCppBackend"]
            o_exp["core/export/<br/>ExportOrchestrator<br/>ExportBackend"]
            o_data["core/data_recipe/<br/>service.py · jobs/manager.py"]
        end

        subgraph supportL["Support modules"]
            direction LR
            authmod["auth/<br/>authentication.py · storage.py · hashing.py"]
            store["storage/<br/>studio_db.py (sqlite WAL)"]
            dtos["models/<br/>training.py · inference.py · export.py · users.py"]
            utils["utils/<br/>hardware · paths · datasets · models config"]
        end
    end

    subgraph workers["Subprocess workers (separate processes)"]
        direction LR
        w_train["core/training/worker.py<br/>UnslothTrainer (in unsloth.trainer)"]
        w_inf["core/inference/worker.py<br/>InferenceBackend"]
        w_exp["core/export/worker.py"]
    end

    llama[("llama-server<br/>OS process")]
    sqlite[("studio.db (WAL)")]
    hf[("HF Hub")]

    fe --> mw
    mw --> r_auth
    mw --> r_train
    mw --> r_inf
    mw --> r_models
    mw --> r_data
    mw --> r_exp

    r_auth --> authmod
    r_train --> o_train
    r_train --> store
    r_inf --> o_inf
    r_models --> o_inf
    r_data --> o_data
    r_exp --> o_exp

    o_train -- "mp.Queue + spawn" --> w_train
    o_inf -- "mp.Queue + cancel Event" --> w_inf
    o_inf -- "stdio / HTTP" --> llama
    o_exp -- "mp.Queue" --> w_exp

    o_train --> store
    authmod --> store
    store --> sqlite

    w_train --> hf
    w_inf --> hf

    routesL -. uses .-> dtos
    coreL -. uses .-> utils

    classDef route fill:#85bb65,stroke:#5a8444,color:#fff
    classDef core fill:#1168bd,stroke:#0b4884,color:#fff
    classDef sup fill:#bbb,stroke:#666,color:#000
    classDef work fill:#3b8ed0,stroke:#1168bd,color:#fff
    classDef ext fill:#999,stroke:#555,color:#fff
    class r_auth,r_train,r_inf,r_models,r_data,r_exp route
    class o_train,o_inf,o_exp,o_data core
    class authmod,store,dtos,utils,cors,logmw,authdep sup
    class w_train,w_inf,w_exp work
    class llama,sqlite,hf ext

Thin. They map HTTP concerns (request validation via Pydantic DTOs from models/, Depends(get_current_subject) for auth) onto a single call into the corresponding orchestrator. Example: routes/training.py resolves dataset paths, calls get_training_backend().start_training(...), and returns a job ID.

The fact that inference_router is included twice in main.py:215-220 — once at /api/inference and once at /v1 — gives the system free OpenAI-API compatibility without duplicating handlers. This is a clean example of FastAPI’s router composition acting as an adapter.

This is the layer that actually owns the domain logic. Each core/<feature>/ folder follows a consistent pattern:

core/<feature>/
├── orchestrator.py # parent-process class: lifecycle + RPC
├── worker.py # child-process entrypoint: heavy ML
├── <feature>.py # shared types, enums, helpers
└── (sometimes) trainer.py / inference.py / export.py — domain code

The *Backend / *Orchestrator classes (TrainingBackend, InferenceOrchestrator, ExportOrchestrator) all share a consistent interface:

  • __init__ sets up _lock, _proc, _event_queue / _cmd_queue / _resp_queue, _pump_thread / _dispatcher_thread, _cancel_event.
  • A start method spawns or reuses a worker.
  • A pump/dispatcher thread routes events back to per-request mailboxes (notably InferenceOrchestrator._mailboxes, which lets the compare-mode UI run multiple in-flight requests against one worker).
  • A force_terminate / _shutdown_subprocess is called by the global _graceful_shutdown handler in run.py:185.
  • auth/authentication.py issues short-lived access JWTs (1 h) and longer refresh tokens (7 d), plus an API-key path. The bootstrap admin is auto-seeded on first launch and its password is written to a file under ~/.unsloth/studio/.bootstrap_password. The HTML index injects window.__UNSLOTH_BOOTSTRAP__ with these credentials only until the user changes the password (see main.py:349-374).
  • storage/studio_db.py owns one SQLite file in WAL mode; tables include training_runs and a metrics table that captures loss, lr, grad_norm, and eval_loss per step. cleanup_orphaned_runs() runs at startup to mark crashed runs as failed.
  • utils/hardware/ sets the device backend (cuda / rocm / mps / cpu) into a module global early. Routes read it via get_device() to decide whether to allow training endpoints at all.

4. C4 Level 3 — Components (React Frontend)

Section titled “4. C4 Level 3 — Components (React Frontend)”

The frontend follows a feature-based architecture (sometimes called “screaming architecture”): the top-level folder names tell you what the app does, not what tech it uses.

flowchart TB
    user(["User"])
    api(["FastAPI Backend"])

    subgraph spa["React SPA (studio/frontend/src)"]
        direction TB

        subgraph appL["app/"]
            router["router.tsx<br/>(TanStack Router)"]
            provider["provider.tsx<br/>(theme · QueryClient · Toaster)"]
            guards["auth-guards.ts"]
        end

        subgraph featL["features/"]
            direction TB
            f_auth["auth/<br/>login · change-password ·<br/>session.ts · tauri-auto-auth.ts"]
            f_chat["chat/<br/>chat-page · runtime-provider ·<br/>thread-sidebar · presets · Dexie db"]
            f_train["training/<br/>api · stores (zustand) ·<br/>hooks · components · lib"]
            f_dr["data-recipes/<br/>pages · learning-recipes · hooks"]
            f_rs["recipe-studio/<br/>(node-graph editor with xyflow)"]
            f_exp["export/"]
            f_set["settings/ · profile/ · onboarding/ · tour/ · studio/"]
        end

        subgraph sharedL["Shared building blocks"]
            stores["stores/ (Zustand global)"]
            comp["components/<br/>app-sidebar · navbar ·<br/>shadcn ui · assistant-ui · markdown"]
            hooks["hooks/ · lib/ · utils/ · shared/"]
            tauriBridge["components/tauri/<br/>(window controls, updater hooks)"]
        end
    end

    user --> router
    router --> guards
    router --> f_auth
    router --> f_chat
    router --> f_train
    router --> f_dr
    router --> f_rs
    router --> f_exp
    router --> f_set

    f_auth --> api
    f_chat --> api
    f_train --> api
    f_dr --> api
    f_exp --> api

    f_chat --> tauriBridge
    f_auth --> tauriBridge
    appL --> sharedL

    classDef app fill:#1168bd,stroke:#0b4884,color:#fff
    classDef feat fill:#85bb65,stroke:#5a8444,color:#fff
    classDef shared fill:#bbb,stroke:#666,color:#000
    classDef ext fill:#999,stroke:#555,color:#fff
    class router,provider,guards app
    class f_auth,f_chat,f_train,f_dr,f_rs,f_exp,f_set feat
    class stores,comp,hooks,tauriBridge shared
    class api,user ext
  • Each feature is self-contained: features/training/ ships its own api/, stores/, hooks/, components/, types/. Cross-feature reuse goes through shared/ or components/ui — there is no “global service registry”.
  • State: Zustand for app state (e.g. stores/training.ts, features/training/stores/training-runtime-store.ts), Dexie/IndexedDB for chat history (features/chat/db.ts), plain useState for purely-local UI state. There is no Redux, no React Query in the deps — fetches go through hand-rolled typed clients.
  • Routing is type-safe via @tanstack/react-router with code-split route files in app/routes/.
  • Tauri integration is additive: any code that needs the desktop bridge guards on window.__TAURI__ and falls back to web behavior, so the same SPA bundle runs in both Tauri and a vanilla browser.

5. Class-level UML — Python OOP backbone

Section titled “5. Class-level UML — Python OOP backbone”

Two related class hierarchies dominate the Python side: the Studio orchestrators in the parent process and the Unsloth Fast* model family that the workers actually use. The diagram below merges both.

classDiagram
    %% ============== Studio backend orchestrators ==============
    class TrainingProgress {
        +epoch: float
        +step: int
        +total_steps: int
        +loss: Optional[float]
        +learning_rate: Optional[float]
        +is_training: bool
        +is_completed: bool
        +error: Optional[str]
        +eta_seconds: Optional[float]
    }

    class TrainingBackend {
        -_proc: mp.Process
        -_event_queue: mp.Queue
        -_stop_queue: mp.Queue
        -_pump_thread: Thread
        -_lock: Lock
        -_progress: TrainingProgress
        -_metric_buffer: list
        +current_job_id: str
        +loss_history: list
        +lr_history: list
        +start_training(config, dataset, ...) str
        +stop(save: bool) None
        +get_progress() TrainingProgress
        +get_metrics() dict
        +force_terminate() None
        -_pump_events() void
        -_flush_metrics() void
    }

    class InferenceOrchestrator {
        -_proc: mp.Process
        -_cmd_queue: mp.Queue
        -_resp_queue: mp.Queue
        -_cancel_event: mp.Event
        -_lock: Lock
        -_gen_lock: Lock
        -_mailboxes: dict
        -_dispatcher_thread: Thread
        -_current_transformers_major: str
        +active_model_name: str
        +models: dict
        +load_model(name, ...) LoadResult
        +unload_model() None
        +generate(prompt, ...) Generator
        +cancel(request_id) None
        +default_models() list
        -_ensure_subprocess(major) None
        -_shutdown_subprocess(timeout) None
    }

    class LlamaCppBackend {
        -_proc: subprocess.Popen
        -_port: int
        -_model_path: Path
        +load(gguf_path, ...) None
        +generate(...) Generator
        +unload() None
        -_kill_process() None
    }

    class ExportOrchestrator {
        -_proc: mp.Process
        +export_merged(...) JobId
        +export_lora_adapter(...) JobId
        +export_gguf(...) JobId
        +get_status(job_id) ExportStatus
        -_shutdown_subprocess(timeout) None
    }

    class ExportBackend {
        +run_export(request) None
    }

    %% ============== Pydantic DTOs (selected) ==============
    class TrainingStartRequest {
        <<Pydantic>>
        +model_name: str
        +dataset_paths: list[str]
        +config: dict
    }
    class TrainingJobResponse {
        <<Pydantic>>
        +job_id: str
        +status: str
    }
    class GenerateRequest {
        <<Pydantic>>
        +prompt: str
        +messages: list
        +max_tokens: int
        +temperature: float
    }

    %% ============== Auth ==============
    class AuthStorage {
        <<module>>
        +ensure_default_admin() bool
        +get_user_and_secret(name) tuple
        +save_refresh_token(...) None
        +verify_refresh_token(...) bool
        +validate_api_key(key) Optional[str]
    }
    class Authentication {
        <<module>>
        +create_access_token(subject) str
        +create_refresh_token(subject) str
        +get_current_subject() str
    }

    %% ============== Unsloth Core: Fast* model family ==============
    class FastBaseModel {
        <<unsloth.models>>
        +from_pretrained(...) tuple[Model, Tokenizer]
        +get_peft_model(...) Model
        +for_inference(model) Model
        +for_training(model) Model
        +patch_peft_model(...) None
    }

    class FastModel {
        +from_pretrained(...) tuple
    }

    class FastLlamaModel {
        +pre_patch() None
        +post_patch(model) None
        +from_pretrained(...) tuple
    }

    class FastLanguageModel
    class FastVisionModel
    class FastTextModel
    class FastMistralModel
    class FastQwen2Model
    class FastQwen3Model
    class FastQwen3MoeModel
    class FastGraniteModel
    class FastCohereModel
    class FastFalconH1Model
    class FastSentenceTransformer

    %% ============== Trainer (HF/TRL extension) ==============
    class TrainingArguments {
        <<transformers>>
    }
    class SFTTrainer {
        <<trl>>
    }
    class UnslothTrainingArguments {
        +qgalore_config: QGaloreConfig
    }
    class UnslothTrainer {
        +train(...) None
        +_inner_training_loop(...) None
    }
    class QGaloreConfig {
        +rank: int
        +update_proj_gap: int
        +scale: float
    }

    %% ============== Relationships ==============
    TrainingBackend ..> TrainingProgress : produces
    TrainingBackend ..> AuthStorage : (via routes)
    InferenceOrchestrator ..> LlamaCppBackend : delegates GGUF to
    ExportOrchestrator ..> ExportBackend : "in worker"

    TrainingBackend o-- "1 spawned" UnslothTrainer : in worker
    InferenceOrchestrator o-- "1 spawned" FastLanguageModel : in worker

    Authentication ..> AuthStorage : reads/writes

    FastBaseModel <|-- FastModel
    FastModel <|-- FastVisionModel
    FastModel <|-- FastTextModel
    FastLlamaModel <|-- FastLanguageModel
    FastLlamaModel <|-- FastMistralModel
    FastLlamaModel <|-- FastQwen2Model
    FastLlamaModel <|-- FastQwen3Model
    FastQwen3Model <|-- FastQwen3MoeModel
    FastLlamaModel <|-- FastGraniteModel
    FastLlamaModel <|-- FastCohereModel
    FastLlamaModel <|-- FastFalconH1Model

    TrainingArguments <|-- UnslothTrainingArguments
    SFTTrainer <|-- UnslothTrainer
    UnslothTrainingArguments *-- QGaloreConfig

    TrainingStartRequest ..> TrainingBackend : parsed by route
    GenerateRequest ..> InferenceOrchestrator : parsed by route

Caveat on inheritance lines: FastLanguageModel is declared as class FastLanguageModel(FastLlamaModel) and FastVisionModel/FastTextModel are declared as class FastVisionModel(FastModel) (see unsloth/models/loader.py:16). The diagram preserves both lineages. FastModel itself extends FastBaseModel (defined in unsloth_zoo), which is shown here as a stereotype.

HTTP request ──► routes/inference.py
InferenceOrchestrator (parent process)
│ mp.Queue command
worker.py main loop (child process)
│ instantiates
FastLanguageModel.from_pretrained(...)
│ returns (model, tokenizer)
model.generate(...) ──► tokens stream back via mp.Queue
▼ pump thread
per-request mailbox ──► SSE response

For training, replace FastLanguageModel with UnslothTrainer(SFTTrainer) driven by UnslothTrainingArguments, and replace the streaming response with a TrainingProgress event stream pumped into both the SSE channel and the SQLite metrics table.


DecisionWhereWhy it matters
Subprocess isolation per-feature (mp.get_context("spawn"))`core/{training,inference,export}/orchestratortraining.py`
Single FastAPI router mounted at two prefixesmain.py:212-220Free OpenAI-API compatibility (/v1/chat/completions) without duplicating any handler code.
Bootstrap admin + one-time HTML credential injectionmain.py:349-374, auth/storage.ensure_default_adminSolves the desktop-first UX: the user gets an instantly-logged-in webview but the credentials self-destruct from the served HTML the moment they change the password.
Feature-folder frontend, no global service containerstudio/frontend/src/features/*Keeps each domain (chat / training / export / data-recipes) independently shippable; the chat feature even ships its own IndexedDB schema via Dexie.
Tauri-as-supervisor + browser-as-fallbacksrc-tauri/src/process.rs::BackendProcess, main.py::setup_frontendThe same FastAPI server can serve the SPA over plain HTTP for browser users or expose a pure JSON API while Tauri loads the SPA from disk — one binary, two distribution modes.
Structured logging with request middlewareloggers/, LoggingMiddleware in main.pyEvery log line carries a request ID; combined with structlog makes cross-process debugging tractable.

  • C4 model — Hierarchical architecture-diagramming notation by Simon Brown: System Context (C1) → Containers (C2) → Components (C3) → Code (C4). UML class diagrams sit at the C4 level.
  • Container (C4 sense) — A separately runnable unit (process, server, single-page app, database). Not a Docker container.
  • Hexagonal / Ports-and-Adapters — Pattern where the domain core is surrounded by interchangeable adapters; here, routes/ are driving adapters and core/*/worker.py are driven adapters around the ML domain.
  • Orchestrator — Class in the parent process that owns the lifecycle of a worker subprocess and exposes a synchronous-ish API to the routes layer.
  • Fast* model family — Unsloth’s set of monkey-patched HF model classes that swap in faster Triton kernels and optimized LoRA paths.
  • Bootstrap admin — The auto-created unsloth user whose password is generated on first launch and stored at ~/.unsloth/studio/.bootstrap_password.