Unsloth Studio — Architecture (C4 + OOP UML)
Repo:
unsloth(cloned). Two intertwined products live here:
- Unsloth Core — a Python library (
unsloth/) that patchestransformers/trl/peftat import-time to make LLM fine-tuning ~2× faster with up to 70% less VRAM.- Unsloth Studio — a desktop/web app (
studio/) that wraps Unsloth Core behind a FastAPI backend, a React UI, and a Tauri native shell. This is the “Unsloth Web UI” the user is interested in.This report follows Simon Brown’s C4 model (Context → Containers → Components), then adds a Class-level UML diagram for the most architecturally significant Python classes. All diagrams use Mermaid.
0. Bird’s-eye repository map
Section titled “0. Bird’s-eye repository map”unsloth/ ← repo root├── cli.py ← thin shim → unsloth_cli.app├── unsloth_cli/ ← Typer CLI (train/inference/export/studio)│ └── commands/│ ├── train.py│ ├── inference.py│ ├── export.py│ └── studio.py ← `unsloth studio …` subcommands│├── unsloth/ ← Unsloth Core (Python library)│ ├── models/ ← FastLlamaModel, FastQwen3Model, …│ ├── kernels/ ← Triton kernels (rope, rms_norm, …)│ ├── trainer.py ← UnslothTrainer (extends SFTTrainer)│ ├── dataprep/, registry/, optimizers/, utils/│ ├── save.py, chat_templates.py, tokenizer_utils.py│ └── _auto_install.py│├── studio/ ← Unsloth Studio (full-stack app)│ ├── backend/ ← FastAPI server (Python)│ │ ├── main.py, run.py│ │ ├── routes/ ← HTTP adapters│ │ ├── core/ ← Domain orchestration (training/inference/export/data_recipe)│ │ ├── models/ ← Pydantic DTOs│ │ ├── auth/ ← JWT + API key + bootstrap admin│ │ ├── storage/studio_db.py ← SQLite (WAL) for training history│ │ ├── utils/ ← hardware, datasets, paths, …│ │ └── plugins/ ← seed plugins for data-designer│ ├── frontend/ ← React 19 + Vite + TanStack Router│ │ └── src/│ │ ├── app/ ← router, provider│ │ ├── features/{auth,chat,training,data-recipes,export,…}│ │ ├── stores/ ← Zustand│ │ ├── components/ ← shadcn/Radix + assistant-ui│ │ └── hooks/, lib/, shared/│ └── src-tauri/ ← Rust desktop shell (Tauri 2)│ └── src/{main.rs, process.rs, install.rs, update.rs, …}│├── tests/ ← pytest suites└── scripts/ ← housekeeping (formatters, install helpers)Architectural read: The codebase is a layered, subprocess-isolated, hexagonal-ish system. The Python backend acts as the application core; routes are driving adapters (HTTP), and
core/{training,inference,export,data_recipe}orchestrators are driven adapters that talk to long-lived subprocesses where the heavy ML lives. The frontend and the Tauri shell are independent UIs over the same FastAPI surface.
1. C4 Level 1 — System Context
Section titled “1. C4 Level 1 — System Context”What’s at stake
Section titled “What’s at stake”A radiologist or ML engineer sits in front of Unsloth Studio. They want to download a model, fine-tune it on their own dataset, chat with it, and export it — all without leaving the app. The system has to talk to the local GPU, Hugging Face Hub, and (optionally) a llama.cpp inference server.
Diagram (C1)
Section titled “Diagram (C1)”flowchart TB
user(["End User<br/>(ML engineer / researcher / radiologist)"])
apiUser(["External API Client<br/>(Open WebUI, SillyTavern, scripts)"])
subgraph studio["Unsloth Studio System"]
direction TB
sys[("Unsloth Studio<br/>desktop + web app<br/>train · chat · export · data-recipe")]
end
hf[("Hugging Face Hub<br/>model + dataset registry")]
llamacpp[("llama.cpp / llama-server<br/>GGUF inference engine")]
gpu[("Local Hardware<br/>NVIDIA / AMD / Apple Silicon GPU + CPU + RAM")]
fs[("Local Filesystem<br/>~/.unsloth/studio/<br/>checkpoints, datasets, SQLite DB")]
pypi[("PyPI / GitHub<br/>updates · unsloth-zoo · transformers")]
user -- "uses (GUI: Tauri webview or browser)" --> studio
apiUser -- "OpenAI-compat HTTP<br/>(/v1/chat/completions)" --> studio
studio -- "download / upload<br/>models, datasets, adapters" --> hf
studio -- "spawns + RPCs over stdio" --> llamacpp
studio -- "CUDA / ROCm / MPS<br/>via PyTorch" --> gpu
studio -- "reads / writes" --> fs
studio -- "self-update + dep install" --> pypi
classDef person fill:#08427b,stroke:#052e56,color:#fff
classDef system fill:#1168bd,stroke:#0b4884,color:#fff
classDef external fill:#999,stroke:#666,color:#fff
class user,apiUser person
class sys system
class hf,llamacpp,gpu,fs,pypi external
Actors and external systems
Section titled “Actors and external systems”| Actor / System | Role |
|---|---|
| End User | Interacts via the Tauri desktop app or the browser-served React UI. |
| External API Client | Any tool that speaks the OpenAI HTTP schema; Studio mounts its inference router at /v1 so they “just work”. See studio/backend/main.py:215-220. |
| Hugging Face Hub | Source of truth for base models, LoRA adapters, GGUFs, and datasets. Pulled via huggingface_hub and exposed in the UI as a search/download flow. |
| llama.cpp / llama-server | Spawned as a child process for GGUF inference (see core/inference/llama_cpp.py:LlamaCppBackend). Runs in its own subprocess so its lifecycle is decoupled from the Python Transformers backend. |
| Local hardware | Detected at startup by utils/hardware/ — sets a DEVICE global that flows through the whole system. Determines whether Studio runs in CHAT_ONLY mode (e.g. CPU/macOS without MLX) or full-training mode. |
| Local filesystem | Studio writes everything user-related under ~/.unsloth/studio/ (PID file, bootstrap password, SQLite DB). |
| PyPI / GitHub | Source for self-update and dependency installs (src-tauri/src/install.rs, update.rs, plus unsloth_cli/commands/studio.py). |
2. C4 Level 2 — Containers
Section titled “2. C4 Level 2 — Containers”A “container” here is a separately deployable / runnable process. Studio has four runtime containers plus the on-disk database.
Diagram (C2)
Section titled “Diagram (C2)”flowchart TB
user(["End User"])
apiUser(["OpenAI-compat HTTP client"])
subgraph desktop["Desktop machine"]
direction TB
tauri["<b>Tauri Desktop Shell</b><br/>[Rust binary, src-tauri/]<br/>Spawns + supervises backend,<br/>installs deps, handles updates,<br/>tray + custom titlebar"]
fe["<b>React Frontend SPA</b><br/>[TypeScript · React 19 · Vite ·<br/>TanStack Router · Zustand · shadcn]<br/>Served from disk by Tauri webview<br/>OR by FastAPI static mount"]
be["<b>FastAPI Backend</b><br/>[Python · uvicorn · structlog]<br/>Routes: /api/{auth,train,inference,<br/>models,datasets,export,data-recipe} + /v1<br/>Lifespan: hardware detect, admin seed"]
subgraph workers["Subprocess Workers (mp.spawn)"]
direction LR
trainW["Training Worker<br/>core/training/worker.py<br/>UnslothTrainer + SFT/DPO/GRPO"]
infW["Inference Worker<br/>core/inference/worker.py<br/>HF Transformers · Unsloth patches"]
expW["Export Worker<br/>core/export/worker.py<br/>save_pretrained · GGUF convert"]
llamaW["llama-server<br/>(C++ binary, OS subprocess)<br/>GGUF inference"]
end
db[("SQLite DB<br/>~/.unsloth/studio/studio.db<br/>WAL · users · runs · metrics")]
fsstore[("Local FS<br/>~/.unsloth/studio/<br/>checkpoints, datasets, logs")]
unsloth_lib[/"<b>Unsloth Core lib</b><br/>[Python pkg, unsloth/]<br/>imported INSIDE workers only"/]
end
hf[("HuggingFace Hub")]
user -- "WebView2 / WKWebView" --> tauri
user -- "or http://localhost:8888 in browser" --> be
tauri -- "spawns + monitors stdout" --> be
tauri -- "loads bundled SPA" --> fe
fe -- "fetch / SSE / WebSocket<br/>Bearer JWT or API key" --> be
apiUser -- "/v1/chat/completions" --> be
be -- "mp.Queue commands + events<br/>(spawn ctx)" --> trainW
be -- "mp.Queue + cancel Event" --> infW
be -- "mp.Queue" --> expW
be -- "stdio JSON-RPC / HTTP" --> llamaW
trainW -. "import" .-> unsloth_lib
infW -. "import" .-> unsloth_lib
expW -. "import" .-> unsloth_lib
be -- "sqlite3 (WAL)" --> db
trainW -- "writes metrics/checkpoints" --> fsstore
infW -- "downloads + caches" --> fsstore
be -- "reads/writes" --> fsstore
trainW -- "model + dataset I/O" --> hf
infW -- "model I/O" --> hf
expW -- "upload (optional)" --> hf
classDef container fill:#1168bd,stroke:#0b4884,color:#fff
classDef worker fill:#3b8ed0,stroke:#1168bd,color:#fff
classDef external fill:#999,stroke:#666,color:#fff
classDef store fill:#85bb65,stroke:#5a8444,color:#fff
class tauri,fe,be container
class trainW,infW,expW,llamaW worker
class hf external
class db,fsstore,unsloth_lib store
Containers explained
Section titled “Containers explained”| Container | Tech | Responsibility | Key files |
|---|---|---|---|
| Tauri Desktop Shell | Rust 2024-edition + Tauri 2 | Boots the desktop window; supervises a Python backend child process; performs first-run install (Python venv, llama-cpp prebuilt, etc.); handles auto-updates; system tray. Sits between the user and the backend. Also implements desktop auto-auth by sharing a generated secret with the backend so the webview can skip the login screen. | studio/src-tauri/src/{main.rs, process.rs, install.rs, update.rs, desktop_auth.rs, preflight.rs} |
| React Frontend SPA | React 19 + Vite + TS strict + TanStack Router + Zustand + shadcn/Radix + Tailwind 4 + assistant-ui (chat) + xyflow (data-recipe nodes) | Five top-level features: auth, chat, training, data-recipes, export. Each feature has its own api/ (typed fetch client), stores/ (Zustand), hooks/, components/. State is mostly per-feature local stores; only training has a global store at src/stores/training.ts. | studio/frontend/src/{app, features, stores, components} |
| FastAPI Backend | Python 3.10+, uvicorn, FastAPI, structlog, pydantic v2 | The orchestration core. Boots in main.py via a lifespan context manager that detects hardware, cleans stale compiled cache, seeds the default admin, and pre-caches a helper GGUF in a daemon thread. Routes are mounted under /api/* (and inference_router is also mounted at /v1 for OpenAI compatibility). | studio/backend/{main.py, run.py, routes/, core/, models/, auth/, storage/, utils/} |
| Training / Inference / Export Workers | Python subprocesses spawned with mp.get_context("spawn") | Run the heavy ML code (transformers, unsloth, peft, trl). Communicate with the parent via mp.Queue for events and a mp.Event for cancellation. Spawned fresh per training job but persistent across inference requests (with respawn on transformers major-version switch). | studio/backend/core/{training,inference,export}/worker.py |
| llama-server (subprocess) | C++ (external llama.cpp) | Backs GGUF inference. Spawned and supervised by LlamaCppBackend. | studio/backend/core/inference/llama_cpp.py |
| SQLite DB | sqlite3 stdlib, WAL journal | Two domains in one file: auth (users, refresh tokens, API keys, JWT secrets) and studio (training runs, per-step metrics, scan folders). Schemas are created lazily by _ensure_schema() under a process-wide lock. | studio/backend/storage/studio_db.py, studio/backend/auth/storage.py |
Unsloth Core (unsloth/ Python pkg) | Pure Python library | Patches transformers/trl/peft at import time, exposes FastLanguageModel.from_pretrained(...), ships the Triton kernels, and provides UnslothTrainer. Imported only inside workers, never in the parent backend process. | unsloth/{models, kernels, trainer.py, save.py, …} |
Why subprocess isolation?
Section titled “Why subprocess isolation?”core/training/training.py:5-15 and core/inference/orchestrator.py:5-15 both spell it out: PyTorch + transformers + unsloth’s monkey-patches are essentially un-unloadable from a Python interpreter. To run a Qwen model that needs transformers==4.57 and then a GLM model that needs transformers==5.x, the only workable answer is kill the worker, spawn a new one — even from the same parent process. The _CTX = mp.get_context("spawn") pattern (vs. the default fork on Linux) ensures the child starts from a clean interpreter and re-imports everything.
3. C4 Level 3 — Components (FastAPI Backend)
Section titled “3. C4 Level 3 — Components (FastAPI Backend)”This zooms inside the FastAPI Backend container. The backend follows a clear three-layer split:
Routes (HTTP adapters) │ call into ▼Core orchestrators (parent-process logic + subprocess RPC) │ RPC over mp.Queue ▼Workers (run inside spawned subprocesses, import unsloth/transformers)Cross-cutting: auth/ (JWT bearer + API key middleware), storage/ (SQLite), models/ (Pydantic DTOs), utils/hardware/ (the device detector that sets DEVICE and CHAT_ONLY globals consumed everywhere).
Diagram (C3)
Section titled “Diagram (C3)”flowchart TB
fe(["React SPA / external client"])
subgraph backend["FastAPI Backend"]
direction TB
subgraph mw["Cross-cutting (FastAPI middleware + deps)"]
direction LR
cors["CORSMiddleware"]
logmw["LoggingMiddleware<br/>(structlog request IDs)"]
authdep["get_current_subject<br/>(HTTPBearer JWT/API-key)"]
end
subgraph routesL["Routes layer (HTTP adapters)"]
direction LR
r_auth["routes/auth.py<br/>POST /api/auth/login<br/>POST /refresh<br/>POST /change-password"]
r_train["routes/training.py<br/>+ training_history.py<br/>POST /api/train/start /stop<br/>GET /events (SSE)<br/>GET /history"]
r_inf["routes/inference.py<br/>POST /generate (SSE)<br/>POST /load /unload<br/>+ mounted as /v1 (OpenAI)"]
r_models["routes/models.py<br/>GET /api/models"]
r_data["routes/data_recipe<br/>+ datasets.py"]
r_exp["routes/export.py"]
end
subgraph coreL["Core layer (orchestrators)"]
direction LR
o_train["core/training/<br/>TrainingBackend<br/>TrainingProgress"]
o_inf["core/inference/<br/>InferenceOrchestrator<br/>LlamaCppBackend"]
o_exp["core/export/<br/>ExportOrchestrator<br/>ExportBackend"]
o_data["core/data_recipe/<br/>service.py · jobs/manager.py"]
end
subgraph supportL["Support modules"]
direction LR
authmod["auth/<br/>authentication.py · storage.py · hashing.py"]
store["storage/<br/>studio_db.py (sqlite WAL)"]
dtos["models/<br/>training.py · inference.py · export.py · users.py"]
utils["utils/<br/>hardware · paths · datasets · models config"]
end
end
subgraph workers["Subprocess workers (separate processes)"]
direction LR
w_train["core/training/worker.py<br/>UnslothTrainer (in unsloth.trainer)"]
w_inf["core/inference/worker.py<br/>InferenceBackend"]
w_exp["core/export/worker.py"]
end
llama[("llama-server<br/>OS process")]
sqlite[("studio.db (WAL)")]
hf[("HF Hub")]
fe --> mw
mw --> r_auth
mw --> r_train
mw --> r_inf
mw --> r_models
mw --> r_data
mw --> r_exp
r_auth --> authmod
r_train --> o_train
r_train --> store
r_inf --> o_inf
r_models --> o_inf
r_data --> o_data
r_exp --> o_exp
o_train -- "mp.Queue + spawn" --> w_train
o_inf -- "mp.Queue + cancel Event" --> w_inf
o_inf -- "stdio / HTTP" --> llama
o_exp -- "mp.Queue" --> w_exp
o_train --> store
authmod --> store
store --> sqlite
w_train --> hf
w_inf --> hf
routesL -. uses .-> dtos
coreL -. uses .-> utils
classDef route fill:#85bb65,stroke:#5a8444,color:#fff
classDef core fill:#1168bd,stroke:#0b4884,color:#fff
classDef sup fill:#bbb,stroke:#666,color:#000
classDef work fill:#3b8ed0,stroke:#1168bd,color:#fff
classDef ext fill:#999,stroke:#555,color:#fff
class r_auth,r_train,r_inf,r_models,r_data,r_exp route
class o_train,o_inf,o_exp,o_data core
class authmod,store,dtos,utils,cors,logmw,authdep sup
class w_train,w_inf,w_exp work
class llama,sqlite,hf ext
Layer-by-layer narrative
Section titled “Layer-by-layer narrative”Routes (HTTP adapters)
Section titled “Routes (HTTP adapters)”Thin. They map HTTP concerns (request validation via Pydantic DTOs from models/, Depends(get_current_subject) for auth) onto a single call into the corresponding orchestrator. Example: routes/training.py resolves dataset paths, calls get_training_backend().start_training(...), and returns a job ID.
The fact that inference_router is included twice in main.py:215-220 — once at /api/inference and once at /v1 — gives the system free OpenAI-API compatibility without duplicating handlers. This is a clean example of FastAPI’s router composition acting as an adapter.
Core (orchestrators)
Section titled “Core (orchestrators)”This is the layer that actually owns the domain logic. Each core/<feature>/ folder follows a consistent pattern:
core/<feature>/├── orchestrator.py # parent-process class: lifecycle + RPC├── worker.py # child-process entrypoint: heavy ML├── <feature>.py # shared types, enums, helpers└── (sometimes) trainer.py / inference.py / export.py — domain codeThe *Backend / *Orchestrator classes (TrainingBackend, InferenceOrchestrator, ExportOrchestrator) all share a consistent interface:
__init__sets up_lock,_proc,_event_queue/_cmd_queue/_resp_queue,_pump_thread/_dispatcher_thread,_cancel_event.- A start method spawns or reuses a worker.
- A pump/dispatcher thread routes events back to per-request mailboxes (notably
InferenceOrchestrator._mailboxes, which lets the compare-mode UI run multiple in-flight requests against one worker). - A
force_terminate/_shutdown_subprocessis called by the global_graceful_shutdownhandler inrun.py:185.
Support modules
Section titled “Support modules”auth/authentication.pyissues short-lived access JWTs (1 h) and longer refresh tokens (7 d), plus an API-key path. The bootstrap admin is auto-seeded on first launch and its password is written to a file under~/.unsloth/studio/.bootstrap_password. The HTML index injectswindow.__UNSLOTH_BOOTSTRAP__with these credentials only until the user changes the password (seemain.py:349-374).storage/studio_db.pyowns one SQLite file in WAL mode; tables includetraining_runsand a metrics table that capturesloss,lr,grad_norm, andeval_lossper step.cleanup_orphaned_runs()runs at startup to mark crashed runs as failed.utils/hardware/sets the device backend (cuda/rocm/mps/cpu) into a module global early. Routes read it viaget_device()to decide whether to allow training endpoints at all.
4. C4 Level 3 — Components (React Frontend)
Section titled “4. C4 Level 3 — Components (React Frontend)”The frontend follows a feature-based architecture (sometimes called “screaming architecture”): the top-level folder names tell you what the app does, not what tech it uses.
flowchart TB
user(["User"])
api(["FastAPI Backend"])
subgraph spa["React SPA (studio/frontend/src)"]
direction TB
subgraph appL["app/"]
router["router.tsx<br/>(TanStack Router)"]
provider["provider.tsx<br/>(theme · QueryClient · Toaster)"]
guards["auth-guards.ts"]
end
subgraph featL["features/"]
direction TB
f_auth["auth/<br/>login · change-password ·<br/>session.ts · tauri-auto-auth.ts"]
f_chat["chat/<br/>chat-page · runtime-provider ·<br/>thread-sidebar · presets · Dexie db"]
f_train["training/<br/>api · stores (zustand) ·<br/>hooks · components · lib"]
f_dr["data-recipes/<br/>pages · learning-recipes · hooks"]
f_rs["recipe-studio/<br/>(node-graph editor with xyflow)"]
f_exp["export/"]
f_set["settings/ · profile/ · onboarding/ · tour/ · studio/"]
end
subgraph sharedL["Shared building blocks"]
stores["stores/ (Zustand global)"]
comp["components/<br/>app-sidebar · navbar ·<br/>shadcn ui · assistant-ui · markdown"]
hooks["hooks/ · lib/ · utils/ · shared/"]
tauriBridge["components/tauri/<br/>(window controls, updater hooks)"]
end
end
user --> router
router --> guards
router --> f_auth
router --> f_chat
router --> f_train
router --> f_dr
router --> f_rs
router --> f_exp
router --> f_set
f_auth --> api
f_chat --> api
f_train --> api
f_dr --> api
f_exp --> api
f_chat --> tauriBridge
f_auth --> tauriBridge
appL --> sharedL
classDef app fill:#1168bd,stroke:#0b4884,color:#fff
classDef feat fill:#85bb65,stroke:#5a8444,color:#fff
classDef shared fill:#bbb,stroke:#666,color:#000
classDef ext fill:#999,stroke:#555,color:#fff
class router,provider,guards app
class f_auth,f_chat,f_train,f_dr,f_rs,f_exp,f_set feat
class stores,comp,hooks,tauriBridge shared
class api,user ext
Notes on the frontend pattern
Section titled “Notes on the frontend pattern”- Each feature is self-contained:
features/training/ships its ownapi/,stores/,hooks/,components/,types/. Cross-feature reuse goes throughshared/orcomponents/ui— there is no “global service registry”. - State: Zustand for app state (e.g.
stores/training.ts,features/training/stores/training-runtime-store.ts), Dexie/IndexedDB for chat history (features/chat/db.ts), plainuseStatefor purely-local UI state. There is no Redux, no React Query in the deps — fetches go through hand-rolled typed clients. - Routing is type-safe via
@tanstack/react-routerwith code-split route files inapp/routes/. - Tauri integration is additive: any code that needs the desktop bridge guards on
window.__TAURI__and falls back to web behavior, so the same SPA bundle runs in both Tauri and a vanilla browser.
5. Class-level UML — Python OOP backbone
Section titled “5. Class-level UML — Python OOP backbone”Two related class hierarchies dominate the Python side: the Studio orchestrators in the parent process and the Unsloth Fast* model family that the workers actually use. The diagram below merges both.
classDiagram
%% ============== Studio backend orchestrators ==============
class TrainingProgress {
+epoch: float
+step: int
+total_steps: int
+loss: Optional[float]
+learning_rate: Optional[float]
+is_training: bool
+is_completed: bool
+error: Optional[str]
+eta_seconds: Optional[float]
}
class TrainingBackend {
-_proc: mp.Process
-_event_queue: mp.Queue
-_stop_queue: mp.Queue
-_pump_thread: Thread
-_lock: Lock
-_progress: TrainingProgress
-_metric_buffer: list
+current_job_id: str
+loss_history: list
+lr_history: list
+start_training(config, dataset, ...) str
+stop(save: bool) None
+get_progress() TrainingProgress
+get_metrics() dict
+force_terminate() None
-_pump_events() void
-_flush_metrics() void
}
class InferenceOrchestrator {
-_proc: mp.Process
-_cmd_queue: mp.Queue
-_resp_queue: mp.Queue
-_cancel_event: mp.Event
-_lock: Lock
-_gen_lock: Lock
-_mailboxes: dict
-_dispatcher_thread: Thread
-_current_transformers_major: str
+active_model_name: str
+models: dict
+load_model(name, ...) LoadResult
+unload_model() None
+generate(prompt, ...) Generator
+cancel(request_id) None
+default_models() list
-_ensure_subprocess(major) None
-_shutdown_subprocess(timeout) None
}
class LlamaCppBackend {
-_proc: subprocess.Popen
-_port: int
-_model_path: Path
+load(gguf_path, ...) None
+generate(...) Generator
+unload() None
-_kill_process() None
}
class ExportOrchestrator {
-_proc: mp.Process
+export_merged(...) JobId
+export_lora_adapter(...) JobId
+export_gguf(...) JobId
+get_status(job_id) ExportStatus
-_shutdown_subprocess(timeout) None
}
class ExportBackend {
+run_export(request) None
}
%% ============== Pydantic DTOs (selected) ==============
class TrainingStartRequest {
<<Pydantic>>
+model_name: str
+dataset_paths: list[str]
+config: dict
}
class TrainingJobResponse {
<<Pydantic>>
+job_id: str
+status: str
}
class GenerateRequest {
<<Pydantic>>
+prompt: str
+messages: list
+max_tokens: int
+temperature: float
}
%% ============== Auth ==============
class AuthStorage {
<<module>>
+ensure_default_admin() bool
+get_user_and_secret(name) tuple
+save_refresh_token(...) None
+verify_refresh_token(...) bool
+validate_api_key(key) Optional[str]
}
class Authentication {
<<module>>
+create_access_token(subject) str
+create_refresh_token(subject) str
+get_current_subject() str
}
%% ============== Unsloth Core: Fast* model family ==============
class FastBaseModel {
<<unsloth.models>>
+from_pretrained(...) tuple[Model, Tokenizer]
+get_peft_model(...) Model
+for_inference(model) Model
+for_training(model) Model
+patch_peft_model(...) None
}
class FastModel {
+from_pretrained(...) tuple
}
class FastLlamaModel {
+pre_patch() None
+post_patch(model) None
+from_pretrained(...) tuple
}
class FastLanguageModel
class FastVisionModel
class FastTextModel
class FastMistralModel
class FastQwen2Model
class FastQwen3Model
class FastQwen3MoeModel
class FastGraniteModel
class FastCohereModel
class FastFalconH1Model
class FastSentenceTransformer
%% ============== Trainer (HF/TRL extension) ==============
class TrainingArguments {
<<transformers>>
}
class SFTTrainer {
<<trl>>
}
class UnslothTrainingArguments {
+qgalore_config: QGaloreConfig
}
class UnslothTrainer {
+train(...) None
+_inner_training_loop(...) None
}
class QGaloreConfig {
+rank: int
+update_proj_gap: int
+scale: float
}
%% ============== Relationships ==============
TrainingBackend ..> TrainingProgress : produces
TrainingBackend ..> AuthStorage : (via routes)
InferenceOrchestrator ..> LlamaCppBackend : delegates GGUF to
ExportOrchestrator ..> ExportBackend : "in worker"
TrainingBackend o-- "1 spawned" UnslothTrainer : in worker
InferenceOrchestrator o-- "1 spawned" FastLanguageModel : in worker
Authentication ..> AuthStorage : reads/writes
FastBaseModel <|-- FastModel
FastModel <|-- FastVisionModel
FastModel <|-- FastTextModel
FastLlamaModel <|-- FastLanguageModel
FastLlamaModel <|-- FastMistralModel
FastLlamaModel <|-- FastQwen2Model
FastLlamaModel <|-- FastQwen3Model
FastQwen3Model <|-- FastQwen3MoeModel
FastLlamaModel <|-- FastGraniteModel
FastLlamaModel <|-- FastCohereModel
FastLlamaModel <|-- FastFalconH1Model
TrainingArguments <|-- UnslothTrainingArguments
SFTTrainer <|-- UnslothTrainer
UnslothTrainingArguments *-- QGaloreConfig
TrainingStartRequest ..> TrainingBackend : parsed by route
GenerateRequest ..> InferenceOrchestrator : parsed by route
Caveat on inheritance lines:
FastLanguageModelis declared asclass FastLanguageModel(FastLlamaModel)andFastVisionModel/FastTextModelare declared asclass FastVisionModel(FastModel)(seeunsloth/models/loader.py:16). The diagram preserves both lineages.FastModelitself extendsFastBaseModel(defined inunsloth_zoo), which is shown here as a stereotype.
How the OOP fits together at runtime
Section titled “How the OOP fits together at runtime”HTTP request ──► routes/inference.py │ ▼ InferenceOrchestrator (parent process) │ mp.Queue command ▼ worker.py main loop (child process) │ instantiates ▼ FastLanguageModel.from_pretrained(...) │ returns (model, tokenizer) ▼ model.generate(...) ──► tokens stream back via mp.Queue │ ▼ pump thread per-request mailbox ──► SSE responseFor training, replace FastLanguageModel with UnslothTrainer(SFTTrainer) driven by UnslothTrainingArguments, and replace the streaming response with a TrainingProgress event stream pumped into both the SSE channel and the SQLite metrics table.
6. Key cross-cutting design decisions
Section titled “6. Key cross-cutting design decisions”| Decision | Where | Why it matters |
|---|---|---|
Subprocess isolation per-feature (mp.get_context("spawn")) | `core/{training,inference,export}/orchestrator | training.py` |
| Single FastAPI router mounted at two prefixes | main.py:212-220 | Free OpenAI-API compatibility (/v1/chat/completions) without duplicating any handler code. |
| Bootstrap admin + one-time HTML credential injection | main.py:349-374, auth/storage.ensure_default_admin | Solves the desktop-first UX: the user gets an instantly-logged-in webview but the credentials self-destruct from the served HTML the moment they change the password. |
| Feature-folder frontend, no global service container | studio/frontend/src/features/* | Keeps each domain (chat / training / export / data-recipes) independently shippable; the chat feature even ships its own IndexedDB schema via Dexie. |
| Tauri-as-supervisor + browser-as-fallback | src-tauri/src/process.rs::BackendProcess, main.py::setup_frontend | The same FastAPI server can serve the SPA over plain HTTP for browser users or expose a pure JSON API while Tauri loads the SPA from disk — one binary, two distribution modes. |
| Structured logging with request middleware | loggers/, LoggingMiddleware in main.py | Every log line carries a request ID; combined with structlog makes cross-process debugging tractable. |
7. Glossary
Section titled “7. Glossary”- C4 model — Hierarchical architecture-diagramming notation by Simon Brown: System Context (C1) → Containers (C2) → Components (C3) → Code (C4). UML class diagrams sit at the C4 level.
- Container (C4 sense) — A separately runnable unit (process, server, single-page app, database). Not a Docker container.
- Hexagonal / Ports-and-Adapters — Pattern where the domain core is surrounded by interchangeable adapters; here,
routes/are driving adapters andcore/*/worker.pyare driven adapters around the ML domain. - Orchestrator — Class in the parent process that owns the lifecycle of a worker subprocess and exposes a synchronous-ish API to the routes layer.
Fast*model family — Unsloth’s set of monkey-patched HF model classes that swap in faster Triton kernels and optimized LoRA paths.- Bootstrap admin — The auto-created
unslothuser whose password is generated on first launch and stored at~/.unsloth/studio/.bootstrap_password.