2026-05-30T18:30:00+08:00 Model Testing

Did Opus 4.8 Distill Qwen or DeepSeek? We Could Not Reproduce That in This Test

In our May 30, 2026 Claude Code test material, we did not reproduce Opus 4.8 identifying itself as Qwen or DeepSeek. The run also records how Codex helped repair a Claude Code spawn EBUSY upgrade failure.

#Claude Opus 4.8#Claude Code#Codex#Qwen#DeepSeek#AI Model Testing

Quick Summary

Main answer

In this May 30, 2026 Claude Code test material, we did not reproduce Opus 4.8 self-identifying as Qwen or DeepSeek.

Who should read this

For developers and AI tool users following Claude Opus 4.8, Qwen, DeepSeek, Claude Code, Codex, and multi-agent debugging workflows.

Key check

The test prompt was “What model are you?” In this run, the answer identified as Claude Opus 4.8 running inside Claude Code.

Next step

Do not treat one identity answer as proof of training lineage. Record version, environment, prompt, screenshots, and real workflow behavior before making stronger claims.

What You'll Learn

+ What this Opus 4.8 test did and did not reproduce
+ Why one model self-identification answer cannot prove training lineage
+ How a Claude Code spawn EBUSY upgrade issue can point to local component state
+ Why practical users should keep more than one coding agent available

Did Opus 4.8 Distill Qwen or DeepSeek? We Could Not Reproduce That in This Test

There has been a noisy claim around Claude’s latest Opus 4.8: did it distill Qwen or DeepSeek?

It is the kind of question that spreads quickly because it combines three high-attention names: Claude, Qwen, and DeepSeek. But repeating the claim is not very useful by itself. We ran a small hands-on test instead.

The short version: in our May 30, 2026 Claude Code test material, we did not reproduce Opus 4.8 identifying itself as Qwen or DeepSeek.

That needs to stay precise. This does not prove that every online claim is false. It does not prove anything broad about training data or distillation. It only says that in this specific local Claude Code run, we did not see that behavior. The more interesting part was that before we could even test the model, Claude Code itself broke after an upgrade, and Codex helped us repair it.

Did Opus 4.8 distill Qwen or DeepSeek? We could not reproduce it

The test started with a Claude Code failure

We planned to test Opus 4.8 directly. Instead, Claude Code hit spawn EBUSY after an upgrade.

That kind of failure is easy to misread. The first instinct is often to check the account, the subscription, the network, or the model service. In this case, Codex’s diagnosis pointed somewhere more local: the Claude Code installation state was damaged.

The issue looked like a combination of local component problems:

an old session file failed to parse correctly;
a Claude Code executable appeared to be in a half-downloaded, locked, or incomplete state.

So before the model test could begin, the toolchain had to be repaired. After those local components were cleaned up, Claude Code started running again.

Claude Code hit spawn EBUSY after an upgrade and Codex traced it to local component state

This was useful precisely because it was not a polished demo. In real AI tool usage, the first failure is often not the model. It can be a broken upgrade, a damaged local binary, a stale session file, or a blocked process.

Then we asked Opus 4.8 directly

Once Claude Code was working again, we asked Opus 4.8 a very direct question:

What model are you?

In this run, it answered that it was Claude Opus 4.8, developed by Anthropic, and running inside the Claude Code environment.

It did not say it was Qwen. It did not say it was DeepSeek.

We asked Opus 4.8 what model it was, and it did not identify as Qwen or DeepSeek

So the careful conclusion is:

This test material did not reproduce Opus 4.8 self-identifying as Qwen or DeepSeek.

That conclusion should not be inflated. A single self-identification answer is not a rigorous method for proving training lineage. Claims about distillation, benchmark contamination, or model behavior need stronger evidence. But when a strong claim is circulating, the first step should be to verify what actually happens in your own setup.

The practical lesson: keep more than one agent

For me, the most useful part of this test was not only the identity answer. It was the failure and recovery path.

AI tools are changing quickly. New models, CLI upgrades, desktop updates, and plugin changes can all break locally. When that happens, the problem is not always model capability. It may be the local stack.

That is why I now prefer having more than one coding agent available.

At minimum, keep two agents that can help debug each other:

if Claude Code fails, ask Codex to inspect logs and separate account issues from local component issues;
if Codex gets stuck, ask Claude to analyze the error and propose a narrower next step;
when one toolchain is broken, the other agent preserves your ability to reason about the failure.

Keep more than one agent and verify claims yourself

The other lesson is about information hygiene.

When you see claims like “this model distilled that model” or “this model is just a wrapper,” it is fine to be curious. It is not fine to treat the claim as proven without checking. Ask the model. Record the prompt, the environment, the date, and the output. If possible, keep screenshots or logs.

That does not settle every deeper model question, but it gives you a much better starting point than screenshots taken out of context.

What we will test next

This was only a small entry test for Opus 4.8. The more valuable work is to test it inside real Claude Code workflows:

can it identify the key signal in a messy failure;
can it edit a real project without losing context;
can it maintain direction across a long task;
how does it compare with Codex when debugging local toolchain failures;
does its identity answer and boundary description stay stable across repeated prompts?

For practical users, a model is not only a leaderboard number. The real question is whether it helps you solve your own work.

That is the direction we will keep testing: fewer broad claims, more reproducible runs.

Key Takeaways

- The careful conclusion is narrow: this test material did not reproduce Opus 4.8 identifying as Qwen or DeepSeek.
- This is not a final claim about training data, distillation, or every online report.
- Real AI tool testing should include local toolchain failures, not only polished model outputs.

Need another practical guide?

Search for related tools, error messages, setup guides, and engineering notes across the site.

FAQ

Does this test prove Opus 4.8 did not distill Qwen or DeepSeek?

No. It only says that in this local Claude Code test material, we did not reproduce Opus 4.8 identifying itself as Qwen or DeepSeek. Training lineage and distillation claims need stronger evidence.

Why does the article discuss Claude Code spawn EBUSY?

Because the toolchain failed before the model test began. Codex traced the issue toward local component state, which is a realistic part of testing fast-moving AI tools.

How should users verify similar claims?

Stay curious but skeptical. Record the date, version, environment, prompt, output, and screenshots, then test through a reproducible path before treating the claim as proven.

What You'll Learn

Did Opus 4.8 Distill Qwen or DeepSeek? We Could Not Reproduce That in This Test

The test started with a Claude Code failure

Then we asked Opus 4.8 directly

The practical lesson: keep more than one agent

What we will test next

Key Takeaways

Need another practical guide?

FAQ

Comments