Can ZimaBoard 2 Run a Local AI Assistant?

Eva Wong is the Technical Writer and resident tinkerer at ZimaSpace. A lifelong geek with a passion for homelabs and open-source software, she specializes in translating complex technical concepts into accessible, hands-on guides. Eva believes that self-hosting should be fun, not intimidating. Through her tutorials, she empowers the community to demystify hardware setups, from building their first NAS to mastering Docker containers.

Introduction

At ZimaSpace, we continuously explore how compact hardware can redefine personal computing. In this article, we break down a hands-on experiment by the creator behind the Core Works Lab YouTube channel, who tested whether a fanless single-board server can run a fully local AI voice assistant.

We would like to thank Core Works Lab for the detailed walkthrough and real-world testing. This article transforms their video insights into a structured, written format to help more users understand what’s possible with ZimaBoard 2 as a Home Server—from AI workloads to homelab setups.

Testing ZimaBoard 2 as a Local AI Machine

The device tested is the ZimaBoard 2 (Intel N150, 16GB DDR5, 64GB eMMC), a compact and low-power Home Server designed for flexibility. It supports native SATA and PCIe expansion, allowing users to connect SSDs, GPUs, and networking cards without additional adapters.

The creator’s goal was clear:
Can a fanless Home Server run a local AI voice assistant reliably?

Initial Setup and Hardware Configuration

The system was expanded using:

The board boots into a web-based dashboard, where applications like Docker containers and tools such as N8N can be installed.

Key observation:
The setup process is straightforward, making ZimaBoard 2 accessible even for users building their first Home Server.

However, some minor hardware issues were noted:

  • Mounting bracket screws were not threaded
  • Some screws were too long for certain configurations

Running the AI Assistant (CAL)

The assistant (CAL) was deployed via Docker using CPU-only configuration.

Initial setup included:

  • Speech-to-text: Groq Whisper (cloud)
  • LLM: Groq (cloud inference)
  • Text-to-speech: Piper (local CPU)

Result:
The hybrid setup worked smoothly and responded quickly, establishing a strong baseline.

A key feature demonstrated was short-term memory, where the assistant stored and recalled data like tracking numbers or flight details.

Example:

  • Stored: Flight number AF1
  • Retrieved automatically for tool-based queries

This shows how persistent memory systems can enhance AI assistants on a Home Server.

Local LLM Testing with Ollama

The next phase tested fully local models using Ollama.

Ministral 3B (3 Billion Parameters)

  • Prompt processing: ~268 tokens/sec
  • Generation speed: ~7 tokens/sec

Key finding:
It successfully called tools without fine-tuning, which is impressive.

However:

  • Response time reached up to 6 minutes per interaction

This makes it impractical for real-time voice assistants.

Close-up view of hands lifting a compact white ZIMA personal server out of its cardboard packaging on a wooden table

Function Gemma (270M Parameters)

  • Much faster (~43 tokens/sec)
  • Failed to correctly execute tool calls

Insight:
Smaller models are faster but require fine-tuning to handle structured tasks like tool calling.

Adding a GPU: Performance Gains

A GT 1030 (2GB VRAM) was added via PCIe.

Results:

  • Prompt evaluation speed nearly doubled
  • Model split: 34% GPU / 66% CPU
  • Token generation speed remained similar

Important takeaway:
Bandwidth—not compute—is the bottleneck for token generation.

When testing a smaller model fully loaded into GPU:

  • Prompt evaluation reached 1100 tokens/sec

This confirms:

Full GPU loading dramatically improves latency for a Home Server AI setup

Real-World Limitations

Despite promising results, several constraints emerged:

  • CPU-only setups are too slow for large models
  • Small models lack reliability without training
  • GPU performance depends heavily on VRAM and power supply

The creator noted that a 5GB GPU (e.g., Quadro P2200) could fully load a 3B model and significantly improve performance.

Key Takeaways

  • ZimaBoard 2 can run AI workloads effectively as a Home Server
  • Hybrid (cloud + local) setups deliver the best balance today
  • Local LLMs are viable but require optimization
  • GPU upgrades unlock significant performance gains
  • Tool-calling capability depends more on model design than size

Why ZimaBoard 2 Stands Out

ZimaBoard 2 combines:

  • Low power consumption (24/7 operation)
  • Silent, fanless design
  • Native SATA & PCIe expansion
  • Dual 2.5G Ethernet

This makes it ideal for:

  • Plex media servers
  • Docker labs
  • AI containers
  • Personal NAS systems

As many users describe it:
“A mini server that looks like a toy but runs like a beast.”

Final Thoughts

This experiment shows that building an AI-capable Home Server is no longer out of reach. While fully local voice assistants still face performance challenges, ZimaBoard 2 provides a flexible and powerful foundation for experimentation.

For developers, tinkerers, and homelab enthusiasts, it opens the door to:

And perhaps most importantly—it makes the process fun, hackable, and accessible.

Zima Campaign Hub

More to Read

Build Your Own Cloud with ZimaCube 2
May 23, 2026Homelab Projects

Build Your Own Cloud with ZimaCube 2

Learn how ZimaCube 2 and tools like Nextcloud, Alpha AI, and Resilio Sync let you replace conventional cloud storage with a powerful, private self-hosted...

How ZimaCube 2 Turns a NAS Into an AI Beast
May 22, 2026Home Server Projects

How ZimaCube 2 Turns a NAS Into an AI Beast

ZimaCube 2 is a modular personal cloud NAS that blends high-capacity storage, PCIe expansion, Zima OS, and remote access into a flexible home server...

Get More Builds Like This

Stay in the Loop

Get updates from Zima - new products, exclusive deals, and real builds from the community.

Stay in the Loop preferences

We respect your inbox. Unsubscribe anytime.