•

Tether launches medical AI models that run on phones and outperform larger systems.

Brenda Mary

02:14 08/05/2026

Tether launches medical AI models that run on phones and outperform larger systems.

QVAC MedPsy is a new line of medical language models designed to run on smartphones and edge devices, with a privacy-first approach that keeps sensitive health data local. Early benchmark results indicate that the smaller models deliver performance that is competitive with substantially larger competitors, signaling a shift in how medical AI systems can be structured and deployed.

Compact models with strong benchmark performance

QVAC MedPsy is available in two versions: a 1.7 billion and a 4 billion parameter model. Both were evaluated across eight medical benchmark suites covering clinical knowledge, expert reasoning, and real-world scenarios.

On seven closed-ended benchmarks, the 1.7 billion model scored 62.62. The result beat Google’s MedGemma-4B by more than 11 points, despite using less than half the parameters. On HealthBench Hard, the same model also outperformed MedGemma 27B, which is nearly sixteen times larger.

The 4 billion version scored 70.54 on those same seven benchmarks, exceeding MedGemma-27B-text and other models that are nearly seven times its size. Performance remained strong across HealthBench, HealthBench Hard, and MedXpertQA evaluations.

Efficiency gains highlighted by the company

Tether’s CEO Paolo Ardoino linked the results to efficiency improvements. He said: “Our 4 billion model exceeded results from models nearly seven times its size, while using up to three times fewer tokens per response.”

Token efficiency is presented as a practical outcome of the release. The 4 billion model generates responses in around 909 tokens. Comparable systems use roughly 2,953 tokens per response, which the article describes as a 3.2x reduction in output length. The 1.7 billion model averages about 1,110 tokens per response, compared with 1,901 for similar systems.

Shorter outputs are expected to translate into faster response times and lower compute costs—factors that can influence adoption in healthcare settings where speed and cost matter.

On-device deployment and model sizes

Both models are available in quantized GGUF format for local deployment. The Q4_K_M versions are approximately 1.2 GB for the 1.7 billion model and 2.6 GB for the 4 billion model, making them intended to be practical for mobile devices and on-site hospital systems.

Training approach and privacy implications

The performance gains are attributed to a staged post-training process. The approach combines broad medical supervision, clinical reasoning data, and reinforcement learning on harder cases. The article states that no additional model scaling was required to reach the reported results.

Medical AI has typically relied on cloud infrastructure to process sensitive data remotely. By contrast, QVAC MedPsy is positioned as enabling strong performance entirely on-device, offering healthcare providers additional deployment options in environments with strict privacy rules and limited or restricted cloud access.

•