Close Menu
  • Home
  • News
  • Security
  • Privacy
  • Cybercrime
    • Threat Groups
    • Ransomware
    • Explainers
    • Stealer Logs
  • AI
  • OSINT
  • Tools
    • Ransomtracker
    • Stealercheck
  • Reviews
    • Best antivirus software for 2026: independent picks from Ransomnews
    • Best ransomware-resistant backup for 2026: cloud, hybrid, and immutable picks reviewed
    • Best ransomware protection for business 2026: ESET PROTECT and 5 alternatives reviewed
  • About Us
Facebook X (Twitter) Instagram Threads
Ransomnews
  • Home
  • News
  • Security
  • Privacy
  • Cybercrime
    • Threat Groups
    • Ransomware
    • Explainers
    • Stealer Logs
  • AI
  • OSINT
  • Tools
    • Ransomtracker
    • Stealercheck
  • Reviews
    • Best antivirus software for 2026: independent picks from Ransomnews
    • Best ransomware-resistant backup for 2026: cloud, hybrid, and immutable picks reviewed
    • Best ransomware protection for business 2026: ESET PROTECT and 5 alternatives reviewed
  • About Us
Facebook X (Twitter) LinkedIn
Ransomnews
AI

How to host Llama 3 70B locally with Ollama and Open WebUI: a 2026 tutorial

Martynas VareikisBy Martynas VareikisMay 7, 2026Updated:May 7, 2026No Comments3 Mins Read52 Views
Share Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
A desktop GPU tower with model weights flowing in and a green chat interface on a monitor
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Running a capable LLM locally in 2026 is no longer a research project. The 70B-class open models, Llama 3 70B, Mistral Large, Qwen 2.5, match GPT-3.5 quality on most tasks, sometimes hit GPT-4 territory on specific ones, and run on a single consumer-grade workstation. For privacy-sensitive work, legal review, medical-record summarisation, malware analysis, security research, local AI is the right tool. This tutorial walks through the build end to end.

Step 1: Hardware

Llama 3 70B at 4-bit quantisation needs roughly 40 GB of memory. Three workable hardware paths:

Single GPU with enough VRAM, RTX 4090 (24 GB) won’t fit a 70B; RTX 5090 (32 GB) doesn’t either. You need an A6000 (48 GB, ~$4500 used) or two RTX 4090s in parallel.

Apple Silicon, M2 Ultra Mac Studio with 128 GB unified memory (~$5000) runs 70B at usable speeds (10-15 tokens/sec). M3 Max MacBook Pro with 128 GB also works for development. The unified memory architecture makes Apple unusually well-suited.

CPU + system RAM, slow but free if you have 64+ GB DDR5 already. Expect 2-4 tokens/sec, which is too slow for chat but fine for batch jobs.

For most readers the M2/M3 Ultra Mac Studio is the cleanest answer. For production-leaning setups, dual RTX 4090s with NVLink.

Step 2: Install Ollama

Ollama is the cleanest local-LLM runtime in 2026, it handles model downloads, quantisation, GPU acceleration, and serves a local API.

Mac/Linux: curl -fsSL https://ollama.com/install.sh | sh

Windows: download installer from ollama.com.

Pull the model:

ollama pull llama3.3:70b

That downloads ~40 GB. Test it:

ollama run llama3.3:70b "Explain prompt injection in two sentences."

Step 3: Install Open WebUI for a chat interface

Ollama on its own is a CLI/API. For a ChatGPT-like web UI, install Open WebUI. Cleanest way is Docker:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui --restart always \
  ghcr.io/open-webui/open-webui:main

Browse to http://localhost:3000, create the admin account (first user becomes admin), and you have a chat UI talking to your local Llama. Multi-user auth, conversation history, document upload for RAG, all included.

Step 4: Bind correctly so it doesn’t leak

By default Ollama binds to 127.0.0.1, which is correct. If you want to access it from another machine on your local network, set OLLAMA_HOST=0.0.0.0:11434, but only do that on a network you trust, ideally behind a firewall and a VLAN that doesn’t reach the internet.

Do not expose Ollama or Open WebUI to the public internet. They have no auth by default and any drive-by scanner finds them within hours.

Step 5: Alternative tooling worth knowing

LM Studio, desktop GUI for browsing and running local models, no command line. Easier for non-technical users.

Jan, open-source ChatGPT alternative that runs entirely locally, with a polished UI.

Hugging Face, the underlying model marketplace. If you want to verify weights against published hashes before downloading, hugging face is the source of truth.

Step 6: Use cases that justify the setup

Don’t run local AI for every prompt, cloud models are still better and cheaper for general use. Run local for:

  • Reviewing leaked datasets, malware samples, or anything sensitive that shouldn’t leave your perimeter
  • Privileged legal or medical document review where data residency is contractual
  • Bulk processing where you’d hit cloud rate limits or rack up significant API spend
  • Offline work, flights, conferences with hostile networks, sensitive client sites

For everything else, cloud is fine. The hybrid setup, Anthropic for general chat, local Llama for sensitive work, is what most practitioners actually run in 2026.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Previous ArticleHow to red-team your own LLM app: tutorial with Garak, PyRIT, and Promptfoo
Next Article Build the 2026 privacy stack: Mullvad Browser, GPC, uBlock Origin, and SimpleLogin tutorial
Martynas Vareikis

Martynas Vareikis is the AI Editor at Ransomnews. He covers the intersection of artificial intelligence and information security — from machine-learning models in defensive tooling to the adversarial use of LLMs by ransomware operators, deepfake-driven social engineering, and the rise of agentic threats. His reporting focuses on translating fast-moving AI research into practical guidance for defenders, journalists, and the broader security community. Reach Martynas via [email protected].

Related Posts

Registrų centras breach: 600,000 records exposed

May 27, 2026

Prompt injection: the 2026 LLM defender’s playbook

May 16, 2026

RDP attacks 2026: ransomware’s #1 entry vector

May 16, 2026

Comments are closed.

Facebook X (Twitter) LinkedIn
© 2026 Ransomnews.com

Type above and press Enter to search. Press Esc to cancel.

Cookies on Ransomnews

We use strictly-necessary cookies to run the site and may use first-party analytics to understand which articles are read. Some pages contain affiliate links — when you click one, the affiliate network sets cookies on the merchant's domain to attribute the referral. See the Cookie Policy and Affiliate Disclosure for detail.

RANSOMNEWS.COM

Tracking the criminal infrastructure of the internet.

Independent coverage of ransomware, breach economics, threat actors, privacy, AI security, and the open-source investigation toolkit.

// Topics

  • News
  • Security
  • Privacy
  • Cybercrime
  • AI
  • OSINT
  • Reviews
  • Threat Groups
  • Stealer Logs
  • Ransomtracker
  • Stealercheck

// Site

  • About Us
  • Editorial Team
  • Contact
  • Tip Line
  • Editorial

// Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Affiliate Disclosure
  • RSS Feed
© 2026 Ransomnews.com · Tracking the criminal infrastructure of the internet.