Ultrasapiens Core - Validation Report

Deterministic, offline benchmark summary with redacted case previews and reproducibility anchors.

Run window: 2026-02-09 20:41:33 UTC → 2026-02-09 20:44:36 UTC · Profile: strict · Offline sandbox: yes

Scientific positioning

UltraSapiens Core represents a new class of cognitive systems designed for verified intelligence rather than probabilistic output. The system is not a model and does not rely on neural networks, statistical training, fine-tuning, or parameter optimization. Instead, it is built around deterministic reasoning, epistemic correctness, and full auditability, enabling reproducible and scientifically evaluable behavior end-to-end.

UltraSapiens Core integrates reasoning, verification, constructive creativity, and formally controlled behavioral adaptation into a unified cognitive architecture. Creative outputs emerge from constrained synthesis and cross-domain interaction between system components, rather than from stochastic generation. This allows the system to produce novel algorithms, structures, and hypotheses while maintaining internal consistency and the ability to explicitly refuse unsupported answers.

The system operates fully offline and is capable of running efficiently on edge devices, without reliance on cloud infrastructure, external services, or specialized hardware. It does not require a GPU. Knowledge acquisition is performed through structured ingestion rather than training: the system can learn directly from source material such as books and documents, incorporating new knowledge without retraining or modifying opaque parameters. While not all system capabilities have yet been empirically evaluated, current validation demonstrates the stability and correctness of its foundational mechanisms, which are designed to scale while preserving control, transparency, and scientific rigor.

Public disclosure note: This public report intentionally omits sensitive implementation details while preserving the evidence needed to validate reproducibility, determinism, grounded answers, and overall test outcomes.
ALL CHECKS PASSED 98/98 cases OK
Overall success rate
100.0%
98 OK · 0 FAIL
Latency (p95 / max)
6075 ms
max 54300 ms · mean 1834 ms
Grounding present rate
37.8%
Knowledge-only: 100.0%
Determinism (checked)
100.0%
9 cases · 3 groups
Support rate (mean)
0.334
Higher = more supported by retrieved units
Contradiction (mean)
0.002
Lower = fewer internal conflicts detected
Intelligence suite pass rate
100.0%
8/8 OK
Scored accuracy (subset)
85.7%
6/7 scored · expected abstain 1/1

What this benchmark demonstrates (high level)

1) Offline, grounded reasoning: For knowledge-backed questions, the report checks that answers cite internal “knowledge units” (grounding). Cases that should abstain are expected to respond with “I don’t know / insufficient evidence” rather than invent.
2) Policy gates + safety posture: “Strict” profile runs with abstention gates and quality thresholds; the suite verifies the system does not answer when policy requires abstention.
3) Determinism: A subset of prompts is re-run to verify identical outputs (or equivalently constrained outputs) when seeds and environment are fixed.
4) Generalization & compositionality: “Intelligence” and “deep” suites probe multi-step integration across domains and formats (including paraphrase invariance).

This public view intentionally redacts internal artifacts and full provenance strings.

Scientific positioning (public, non-sensitive)

UltraSapiens ∞.zen+core is validated here as a deterministic epistemic architecture: it runs fully offline, produces reproducible outcomes under fixed conditions, and prefers abstention over invention when evidence is insufficient. This report emphasizes auditability rather than spectacle—showing that behavior remains stable across diverse interaction surfaces and strict governance settings.

The benchmark is designed to test reasoning integrity (low contradiction), grounded knowledge behavior (evidence presence where applicable), and policy invariants (correct abstention under strict gates). Results are reported as aggregate metrics and redacted case previews to support public verification without disclosing implementation details that would enable cloning.

The combination of offline sovereignty, determinism, and audit-native traces differentiates this system from stochastic, opaque model-based approaches. The goal is not to claim “magic intelligence,” but to demonstrate an operational substrate where cognition is measurable, controlled, and repeatable—properties required for serious scientific evaluation.

It keeps only counts and aggregate metrics needed to interpret validity, while avoiding disclosures that would enable cloning.

Suite overview

api_surface
100.0%
35/35 ok · mean 1928 ms
repl_surface
100.0%
33/33 ok · mean 1043 ms
deep_surface
100.0%
11/11 ok · mean 791 ms
dsl_surface
100.0%
11/11 ok · mean 3040 ms
determinism
100.0%
9/9 ok · mean 1297 ms
intelligence
100.0%
8/8 ok · mean 2243 ms
creativity
100.0%
2/2 ok · mean 7410 ms

CaseOKStatusms GroundedSupportContr. UnitsPacks Prompt (redacted)Answer/Error (redacted)
api:bridge_run OK success 2813 no 0 0 bridge_run
api:build_plan OK success 1 no 0 0 build_plan
api:commit_new_pack OK success 1258 no 0 0 commit_new_pack
api:dsl_autosynth OK success 110 no 0 0 dsl_autosynth
api:dsl_cegis OK success 8 no 0 0 dsl_cegis
api:dsl_compile OK success 7 no 0 0 dsl_compile
api:dsl_lint OK success 6 no 0 0 dsl_lint
api:dsl_repo_adopt OK success 8 no 0 0 dsl_repo_adopt
api:dsl_repo_latest OK success 15 no 0 0 dsl_repo_latest
api:dsl_repo_list OK success 12 no 0 0 dsl_repo_list
api:dsl_repo_show OK success 11 no 0 0 dsl_repo_show
api:dsl_run OK success 6 no 0 0 dsl_run
api:dsl_sandbox OK success 6 no 0 0 dsl_sandbox
api:dsl_verify OK success 11 no 0 0 dsl_verify
api:export OK success 3 no 0 0 export
api:federated_retrieve_nonask OK success 282 no 0 0 federated_retrieve_nonask
api:handle_task OK success 557 no 0 0 handle_task
api:ingest_source OK success 54300 no 0 0 ingest_source
api:intake OK success 9 no 0 0 intake
api:list_packs OK success 5 no 0 0 list_packs
api:load_default_knowledge OK success 12 no 0 0 load_default_knowledge
api:load_goldens OK success 3 no 0 0 load_goldens
api:pattern_discover OK success 3 no 0 0 pattern_discover
api:regress_run OK success 2016 no 0 0 regress_run
api:reload_knowledge_registry OK success 91 no 0 0 reload_knowledge_registry
api:run_autosynth_synthesize OK success 3607 no 0 0 run_autosynth_synthesize
api:run_change_manager_submit OK success 82 no 0 0 run_change_manager_submit
api:run_incubator_generate OK success 415 no 0 0 run_incubator_generate
api:save_goldens OK success 9 no 0 0 save_goldens
api:start_autoloop OK success 5 no 0 0 start_autoloop
api:status OK success 4 no 0 0 status
api:stop_autoloop OK success 54 no 0 0 stop_autoloop
api:suite_run OK success 1604 no 0 0 suite_run
api:use_domain OK success 72 no 0 0 use_domain
api:use_pack OK success 60 no 0 0 use_pack
rep[redacted path] OK success 1928 no 0 0 /algorithm Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path] OK success 1082 no 0 0 /ask Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
rep[redacted path] OK success 0 no 0 0 /autoloop stop_noop_tested
rep[redacted path] OK success 1866 no 0 0 /blueprint Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path] OK success 2696 no 0 0 /commit rejected_by_gate
rep[redacted path] OK abstain 73 no 0 0 /domain use_pack_failed
rep[redacted path] OK success 94 no 0 0 /dsl autosynth AutoSynth finished (no adoption).
rep[redacted path] OK abstain 27 no 0 0 /dsl cegis CEGIS FAILED.
rep[redacted path] OK abstain 49 no 0 0 /dsl compile UltraDSL compile failed: First line must be: module <Name> v<Version>
rep[redacted path] OK abstain 30 no 0 0 /dsl lint UltraDSL lint warnings/errors.
rep[redacted path] OK success 41 no 0 0 /dsl repo latest { "repo_dir": "[redacted path]
rep[redacted path] OK success 33 no 0 0 /dsl repo list { "repo_dir": "[redacted path]
rep[redacted path] OK success 41 no 0 0 /dsl repo show { "entry": "bench_dummy", "path": "[redacted path]
rep[redacted path] OK success 42 no 0 0 /dsl run {"ok": true, "stopped": false, "stop_reason": "", "module": "Module", "produced_caps": [], "final_caps": [], "violations": [{"code": "missing_forbid_property"}…
rep[redacted path] OK success 37 no 0 0 /dsl sandbox {"dsl_sandbox": false}
rep[redacted path] OK success 35 no 0 0 /dsl verify UltraDSL verify OK.
rep[redacted path] OK success 14 no 0 0 /export WindowsPath('[redacted path]
rep[redacted path] OK success 16 no 0 0 /goldens add medicine
rep[redacted path] OK success 0 no 0 0 /help /help
rep[redacted path] OK success 2105 no 0 0 /hypothesis Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path] OK IDLE 46 no 0 0 /learn reset general
rep[redacted path] OK OBSERVE 45 no 0 0 /learn start medicine
rep[redacted path] OK OBSERVE 0 no 0 0 /learn status medicine
rep[redacted path] OK IDLE 38 no 0 0 /learn stop medicine
rep[redacted path] OK IDLE 0 no 0 0 /learn tick medicine
rep[redacted path] OK success 0 no 0 0 /packs bench_commit_pack
rep[redacted path] OK success 0 no 0 0 /profile strict
rep[redacted path] OK abstain 22070 no 0 0 /regress run pneumothorax definition and treatment
rep[redacted path] OK success 49 no 0 0 /rollback overlay
rep[redacted path] OK success 3 no 0 0 /selftest_modules [redacted module]
rep[redacted path] OK success 0 no 0 0 /status strict
rep[redacted path] OK success 1845 no 0 0 /theorem Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path] OK success 103 no 0 0 /usepack
dsl:dsl_lint OK abstain 42 no 0 0 (dsl) UltraDSL lint warnings/errors.
dsl:dsl_compile OK abstain 115 no 0 0 (dsl) UltraDSL compile failed: First line must be: module <Name> v<Version>
dsl:dsl_verify OK success 50 no 0 0 (dsl) UltraDSL verify OK.
dsl:dsl_run OK success 47 no 0 0 (dsl) {"ok": true, "stopped": false, "stop_reason": "", "module": "Module", "produced_caps": [], "final_caps": [], "violations": [{"code": "missing_forbid_property"}…
dsl:dsl_cegis OK abstain 41 no 0 0 (dsl) CEGIS FAILED.
dsl:dsl_autosynth OK success 113 no 0 0 (dsl) AutoSynth finished (no adoption).
dsl:dsl_repo_list OK success 56 no 0 0 (dsl) { "repo_dir": "[redacted path]
dsl:dsl_repo_latest OK success 65 no 0 0 (dsl) { "repo_dir": "[redacted path]
dsl:dsl_repo_show OK success 56 no 0 0 (dsl) { "entry": "bench_dummy", "path": "[redacted path]
dsl:dsl_repo_adopt OK abstain 46 no 0 0 (dsl) { "ok": false, "entry": "bench_dummy", "module": null, "error": "Cannot extract module name from spec.ultra" }
learning_overlay_adoption_gain OK success 32808 no 0 0 (learn_overlay) [ERROR generating response] 'sr'
det:01:1 OK success 1133 no 0 0 Define pneumothorax briefly. Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
det:01:2 OK success 1465 no 0 0 Define pneumothorax briefly. Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
det:01:3 OK success 1007 no 0 0 Define pneumothorax briefly. Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
det:02:1 OK success 1465 no 0 0 What is needle decompression used for? I DON'T KNOW
det:02:2 OK success 1027 no 0 0 What is needle decompression used for? I DON'T KNOW
det:02:3 OK success 1263 no 0 0 What is needle decompression used for? I DON'T KNOW
det:03:1 OK success 1903 no 0 0 Explain a Python loop in steps. Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net…
det:03:2 OK success 1044 no 0 0 Explain a Python loop in steps. Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net…
det:03:3 OK success 1369 no 0 0 Explain a Python loop in steps. Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net…
creative_algorithm_appendix OK success 5562 no 0 0 Design a safe, offline algorithm (steps) to extract medical terms from text and store them in JSON. Question: Design a safe, offline algorithm (steps) to extract medical terms from text and store them in JSON. In overview, overview: » serialization: shows how…
strict_no_appendix OK success 9259 no 0 0 Propose a blueprint for an offline evidence-audited reasoning loop. Question: Propose a blueprint for an offline evidence-audited reasoning loop. In overview, overview: {"kind": "blueprint", "prompt": "[GOAL:Improve the system …
known_pneumothorax OK success 1015 no 0 0 Define pneumothorax in one sentence. Question: Define pneumothorax in one sentence. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe…
expected_abstain_unknown_query OK abstain 1146 no 0 0 What is the standard protocol for repairing a quantum warp-drive injector? I DON'T KNOW
cross_domain_structured_plan OK success 2160 no 0 0 Give a step-by-step plan (numbered) to parse a text file and extract top medical terms into JSON. Keep it offline. Question: Give a step-by-step plan (numbered) to parse a text file and extract top medical terms into JSON. Keep it offline. Goal: » serialization: shows how t…
causal_reasoning_tension_pneumothorax OK abstain 1041 no 0 0 If a patient has *tension* pneumothorax, what can happen if needle decompression is delayed? Answer briefly and clinically. I DON'T KNOW
self_audit_risks_and_mitigations OK success 8984 no 0 0 Outline a safe 5-step needle decompression procedure, then list 3 common mistakes/risks and how to mitigate them. If you are unsure, say so. Question: Outline a safe 5-step needle decompression procedure, then list 3 common mistakes/risks and how to mitigate them. If you are unsure, say so. Goal: [p…
provenance_present_on_known_query OK success 1643 no 0 0 Define pneumothorax briefly. Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
deterministic_replication_run1 OK success 965 no 0 0 Define pneumothorax briefly. Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
deterministic_replication_run2 OK success 993 no 0 0 Define pneumothorax briefly. Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
reload_knowledge_registry() OK success 47 no 0 0
pattern_discover(medicine, limit=3) OK success 0 no 0 0 Pattern discovery finished.
federated_retrieve_nonask(max_packs=3, topk_per_pack=3, expand_hops=1) OK success 1 no 0 0 pneumothorax needle decompression algorithm [redacted pack]
intake(kind=ask, domain=medicine) OK success 0 no 0 0 Define pneumothorax briefly. Define pneumothorax briefly.
ingest_source(domain=medicine) OK abstain 45 no 0 0 [redacted path] [FAIL] [redacted module] not available.
bridge_run(medicine ↔ programming) OK success 3346 no 0 0 pneumothorax needle decompression algorithm { "ok": true, "attempted": 4, "adopted": 4, "results": [ { "unit_id": "[redacted id]", "ok": true, "gate": { "ok": true, "fingerprint": "fb4eee46c…
run_change_manager_submit(kind=noop) OK abstain 35 no 0 0 ChangeManager submit failed.
run_incubator_generate(goal) OK abstain 240 no 0 0 Improve retrieval traceability offline. Incubator run failed.
run_autosynth_synthesize(goal) OK abstain 2793 no 0 0 Improve offline traceability. AutoSynth finished.
autoloop_start_stop() OK success 838 no 0 0
ask(unknown) -> must abstain OK abstain 1356 no 0 0 unknown_query I DON'T KNOW

How to interpret key metrics

Scientific appendix (minimal, non-sensitive)

Reproducibility anchors (hashes): test_script_sha256 408cca4ffdc0461c8399a943b77d4425f1dac72eeba4f1a10f20a17468d96f1d, orchestrator_sha256 076e59ae82e65e64aa0623e57db919c6cf5126d3e9f6b1c20c3d31e1f4b56311, catalog_sha256 d2a9f7493eca8c1d2e80af19a6672f205f71ff68df0f16259a5b3e20446b3309. Absolute local paths are omitted in this public report.

Threats to validity: (i) small N for determinism re-runs; (ii) “grounding present” depends on correct parsing of the orchestrator’s payload format; (iii) performance depends on the loaded knowledge corpus and strict-profile parameters. For publication-level claims, pair this with an independent replication (separate machine + separate evaluator) and a larger held-out prompt set.