Can ZimaBoard 2 Run a Local AI Assistant?

Eva Wong

IceWhale author

Eva Wong is the Technical Writer and resident tinkerer at ZimaSpace. A lifelong geek with a passion for homelabs and open-source software, she specializes in translating complex technical concepts into accessible, hands-on guides. Eva believes that self-hosting should be fun, not intimidating. Through her tutorials, she empowers the community to demystify hardware setups, from building their first NAS to mastering Docker containers.

Can ZimaBoard 2 Run a Local AI Assistant? - Zima Store Online

Introduction

At ZimaSpace, we continuously explore how compact hardware can redefine personal computing. In this article, we break down a hands-on experiment by the creator behind the Core Works Lab YouTube channel, who tested whether a fanless single-board server can run a fully local AI voice assistant.

We would like to thank Core Works Lab for the detailed walkthrough and real-world testing. This article transforms their video insights into a structured, written format to help more users understand what’s possible with ZimaBoard 2 as a Home Server—from AI workloads to homelab setups.

Testing ZimaBoard 2 as a Local AI Machine

The device tested is the ZimaBoard 2 (Intel N150, 16GB DDR5, 64GB eMMC), a compact and low-power Home Server designed for flexibility. It supports native SATA and PCIe expansion, allowing users to connect SSDs, GPUs, and networking cards without additional adapters.

The creator’s goal was clear:
Can a fanless Home Server run a local AI voice assistant reliably?

Initial Setup and Hardware Configuration

The system was expanded using:

NVMe SSD via PCIe adapter
Dual 2.5" drive rack
Optional GPU (GT 1030)
ZimaOS pre-installed

The board boots into a web-based dashboard, where applications like Docker containers and tools such as N8N can be installed.

Key observation:
The setup process is straightforward, making ZimaBoard 2 accessible even for users building their first Home Server.

However, some minor hardware issues were noted:

Mounting bracket screws were not threaded
Some screws were too long for certain configurations

Running the AI Assistant (CAL)

The assistant (CAL) was deployed via Docker using CPU-only configuration.

Initial setup included:

Speech-to-text: Groq Whisper (cloud)
LLM: Groq (cloud inference)
Text-to-speech: Piper (local CPU)

Result:
The hybrid setup worked smoothly and responded quickly, establishing a strong baseline.

A key feature demonstrated was short-term memory, where the assistant stored and recalled data like tracking numbers or flight details.

Example:

Stored: Flight number AF1
Retrieved automatically for tool-based queries

This shows how persistent memory systems can enhance AI assistants on a Home Server.

Local LLM Testing with Ollama

The next phase tested fully local models using Ollama.

Ministral 3B (3 Billion Parameters)

Prompt processing: ~268 tokens/sec
Generation speed: ~7 tokens/sec

Key finding:
It successfully called tools without fine-tuning, which is impressive.

However:

Response time reached up to 6 minutes per interaction

This makes it impractical for real-time voice assistants.

Close-up view of hands lifting a compact white ZIMA personal server out of its cardboard packaging on a wooden table

Function Gemma (270M Parameters)

Much faster (~43 tokens/sec)
Failed to correctly execute tool calls

Insight:
Smaller models are faster but require fine-tuning to handle structured tasks like tool calling.

Adding a GPU: Performance Gains

A GT 1030 (2GB VRAM) was added via PCIe.

Results:

Prompt evaluation speed nearly doubled
Model split: 34% GPU / 66% CPU
Token generation speed remained similar

Important takeaway:
Bandwidth—not compute—is the bottleneck for token generation.

When testing a smaller model fully loaded into GPU:

Prompt evaluation reached 1100 tokens/sec

This confirms:

Full GPU loading dramatically improves latency for a Home Server AI setup

Real-World Limitations

Despite promising results, several constraints emerged:

CPU-only setups are too slow for large models
Small models lack reliability without training
GPU performance depends heavily on VRAM and power supply

The creator noted that a 5GB GPU (e.g., Quadro P2200) could fully load a 3B model and significantly improve performance.

Key Takeaways

ZimaBoard 2 can run AI workloads effectively as a Home Server
Hybrid (cloud + local) setups deliver the best balance today
Local LLMs are viable but require optimization
GPU upgrades unlock significant performance gains
Tool-calling capability depends more on model design than size

Featured

ZimaBoard 2 - Hyper Performance Single Board Home Server

Single board computer zimaboard2

Why ZimaBoard 2 Stands Out

ZimaBoard 2 combines:

Low power consumption (24/7 operation)
Silent, fanless design
Native SATA & PCIe expansion
Dual 2.5G Ethernet

This makes it ideal for:

Plex media servers
Docker labs
AI containers
Personal NAS systems

As many users describe it:
“A mini server that looks like a toy but runs like a beast.”

Final Thoughts

This experiment shows that building an AI-capable Home Server is no longer out of reach. While fully local voice assistants still face performance challenges, ZimaBoard 2 provides a flexible and powerful foundation for experimentation.

For developers, tinkerers, and homelab enthusiasts, it opens the door to: