Skip to main content
Overview

How to Set Up Your Own AI at Home

3 min read

I wanted to see what my R720 could do with AI workloads — no GPU, just dual 6-core Xeons and 192GB of RAM. I ended up running three tools: Ollama for LLM inference, Stable Diffusion for image generation, and OpenWebUI to tie them together into a single interface. Here’s how I set it all up.

Ollama

I ran this on an Ubuntu 22.04 VM in Proxmox, allocated 20 cores and 64GB of RAM.

Install

Terminal window
curl -fsSL https://ollama.com/install.sh | sh

Pull and Run Llama3

Terminal window
ollama pull llama3
ollama run llama3

That’s it — you’ll drop into an interactive CLI session. As a sanity check I asked it something from Interstellar:

Note

Question: How does the concept of time dilation in Interstellar relate to Einstein’s theory of relativity?

Tip

Answer: The concept of time dilation in Interstellar is directly related to Albert Einstein’s theory of special relativity, specifically the concept of time dilation. Here’s how: In the movie, a wormhole allows Cooper’s spaceship to travel at incredibly high speeds, approaching relativistic velocities. As they approach the speed of light (0.99c), time appears to slow down for them relative to Earth. This is precisely what Einstein predicted in his theory…

Stable Diffusion

Setup follows the AUTOMATIC1111 WebUI repo.

Install Dependencies

Debian/Ubuntu:

Terminal window
sudo apt install wget git python3 python3-venv libgl1 libglib2.0-0

Red Hat:

Terminal window
sudo dnf install wget git python3 gperftools-libs libglvnd-glx

openSUSE:

Terminal window
sudo zypper install wget git python3 libtcmalloc4 libglvnd

Download and Run

Terminal window
wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh
./webui.sh

CPU-Only Configuration

Since I have no GPU, I added the following to webui-user.sh:

Terminal window
export COMMANDLINE_ARGS="--lowvram --precision full --no-half --skip-torch-cuda-test"

--lowvram reduces memory pressure, --precision full and --no-half avoid half-precision issues on CPU, and --skip-torch-cuda-test skips CUDA checks that would fail anyway.

Expose on Network

By default it binds to localhost. To make it reachable by OpenWebUI:

Terminal window
./webui.sh --listen

This binds to 0.0.0.0:7860.

Stable Diffusion Web Interface

OpenWebUI

OpenWebUI gives you a ChatGPT-style interface on top of Ollama and Stable Diffusion. Easiest part of the whole setup.

Install with Docker

Ollama on the same host:

Terminal window
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data --name open-webui --restart always \
ghcr.io/open-webui/open-webui:main

Ollama on a different server:

Terminal window
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com \
-v open-webui:/app/backend/data --name open-webui --restart always \
ghcr.io/open-webui/open-webui:main

With Nvidia GPU:

Terminal window
docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data --name open-webui --restart always \
ghcr.io/open-webui/open-webui:cuda

Connect Ollama

Go to http://ip_server_openwebui:3000/admin/settings/, then Models → Manage Ollama Models and enter your Ollama server URL.

OpenWebUI Admin Settings

Connect Stable Diffusion

Go to Images, find the AUTOMATIC1111 Base URL field, enter your Stable Diffusion server URL, and hit the refresh button to verify. Save if it checks out. You can then set a default model under Set Default Model.

Stable Diffusion Integration

OpenWebUI Interface

Performance on CPU-Only

Running LLMs without a GPU is slow. Here’s what the numbers look like with Llama3 on dual Xeon E5-2620 v2s:

MetricValue
Response Token/s0.46
Prompt Token/s1.99
Total Duration1072376.46 ms (~17 min 52 sec)
Load Duration61347.1 ms
Prompt Eval Count33
Prompt Eval Duration16571.72 ms
Eval Count457
Eval Duration994411.07 ms

Functional, but not something you’d want to use interactively. I’m planning to add an NVIDIA Tesla P40 to the R720 — once I do, I’ll post an updated guide covering GPU configuration.

Credits

Share this post

Loading comments...