Deterministic, offline benchmark summary with redacted case previews and reproducibility anchors.
This public view intentionally redacts internal artifacts and full provenance strings.
UltraSapiens ∞.zen+core is validated here as a deterministic epistemic architecture: it runs fully offline, produces reproducible outcomes under fixed conditions, and prefers abstention over invention when evidence is insufficient. This report emphasizes auditability rather than spectacle—showing that behavior remains stable across diverse interaction surfaces and strict governance settings.
The benchmark is designed to test reasoning integrity (low contradiction), grounded knowledge behavior (evidence presence where applicable), and policy invariants (correct abstention under strict gates). Results are reported as aggregate metrics and redacted case previews to support public verification without disclosing implementation details that would enable cloning.
The combination of offline sovereignty, determinism, and audit-native traces differentiates this system from stochastic, opaque model-based approaches. The goal is not to claim “magic intelligence,” but to demonstrate an operational substrate where cognition is measurable, controlled, and repeatable—properties required for serious scientific evaluation.
It keeps only counts and aggregate metrics needed to interpret validity, while avoiding disclosures that would enable cloning.| Case | OK | Status | ms | Grounded | Support | Contr. | Units | Packs | Prompt (redacted) | Answer/Error (redacted) |
|---|---|---|---|---|---|---|---|---|---|---|
| api:bridge_run | OK | success | 2813 | no | 0 | 0 | bridge_run | |||
| api:build_plan | OK | success | 1 | no | 0 | 0 | build_plan | |||
| api:commit_new_pack | OK | success | 1258 | no | 0 | 0 | commit_new_pack | |||
| api:dsl_autosynth | OK | success | 110 | no | 0 | 0 | dsl_autosynth | |||
| api:dsl_cegis | OK | success | 8 | no | 0 | 0 | dsl_cegis | |||
| api:dsl_compile | OK | success | 7 | no | 0 | 0 | dsl_compile | |||
| api:dsl_lint | OK | success | 6 | no | 0 | 0 | dsl_lint | |||
| api:dsl_repo_adopt | OK | success | 8 | no | 0 | 0 | dsl_repo_adopt | |||
| api:dsl_repo_latest | OK | success | 15 | no | 0 | 0 | dsl_repo_latest | |||
| api:dsl_repo_list | OK | success | 12 | no | 0 | 0 | dsl_repo_list | |||
| api:dsl_repo_show | OK | success | 11 | no | 0 | 0 | dsl_repo_show | |||
| api:dsl_run | OK | success | 6 | no | 0 | 0 | dsl_run | |||
| api:dsl_sandbox | OK | success | 6 | no | 0 | 0 | dsl_sandbox | |||
| api:dsl_verify | OK | success | 11 | no | 0 | 0 | dsl_verify | |||
| api:export | OK | success | 3 | no | 0 | 0 | export | |||
| api:federated_retrieve_nonask | OK | success | 282 | no | 0 | 0 | federated_retrieve_nonask | |||
| api:handle_task | OK | success | 557 | no | 0 | 0 | handle_task | |||
| api:ingest_source | OK | success | 54300 | no | 0 | 0 | ingest_source | |||
| api:intake | OK | success | 9 | no | 0 | 0 | intake | |||
| api:list_packs | OK | success | 5 | no | 0 | 0 | list_packs | |||
| api:load_default_knowledge | OK | success | 12 | no | 0 | 0 | load_default_knowledge | |||
| api:load_goldens | OK | success | 3 | no | 0 | 0 | load_goldens | |||
| api:pattern_discover | OK | success | 3 | no | 0 | 0 | pattern_discover | |||
| api:regress_run | OK | success | 2016 | no | 0 | 0 | regress_run | |||
| api:reload_knowledge_registry | OK | success | 91 | no | 0 | 0 | reload_knowledge_registry | |||
| api:run_autosynth_synthesize | OK | success | 3607 | no | 0 | 0 | run_autosynth_synthesize | |||
| api:run_change_manager_submit | OK | success | 82 | no | 0 | 0 | run_change_manager_submit | |||
| api:run_incubator_generate | OK | success | 415 | no | 0 | 0 | run_incubator_generate | |||
| api:save_goldens | OK | success | 9 | no | 0 | 0 | save_goldens | |||
| api:start_autoloop | OK | success | 5 | no | 0 | 0 | start_autoloop | |||
| api:status | OK | success | 4 | no | 0 | 0 | status | |||
| api:stop_autoloop | OK | success | 54 | no | 0 | 0 | stop_autoloop | |||
| api:suite_run | OK | success | 1604 | no | 0 | 0 | suite_run | |||
| api:use_domain | OK | success | 72 | no | 0 | 0 | use_domain | |||
| api:use_pack | OK | success | 60 | no | 0 | 0 | use_pack | |||
| rep[redacted path] | OK | success | 1928 | no | 0 | 0 | /algorithm | Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p… | ||
| rep[redacted path] | OK | success | 1082 | no | 0 | 0 | /ask | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| rep[redacted path] | OK | success | 0 | no | 0 | 0 | /autoloop | stop_noop_tested | ||
| rep[redacted path] | OK | success | 1866 | no | 0 | 0 | /blueprint | Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p… | ||
| rep[redacted path] | OK | success | 2696 | no | 0 | 0 | /commit | rejected_by_gate | ||
| rep[redacted path] | OK | abstain | 73 | no | 0 | 0 | /domain | use_pack_failed | ||
| rep[redacted path] | OK | success | 94 | no | 0 | 0 | /dsl autosynth | AutoSynth finished (no adoption). | ||
| rep[redacted path] | OK | abstain | 27 | no | 0 | 0 | /dsl cegis | CEGIS FAILED. | ||
| rep[redacted path] | OK | abstain | 49 | no | 0 | 0 | /dsl compile | UltraDSL compile failed: First line must be: module <Name> v<Version> | ||
| rep[redacted path] | OK | abstain | 30 | no | 0 | 0 | /dsl lint | UltraDSL lint warnings/errors. | ||
| rep[redacted path] | OK | success | 41 | no | 0 | 0 | /dsl repo latest | { "repo_dir": "[redacted path] | ||
| rep[redacted path] | OK | success | 33 | no | 0 | 0 | /dsl repo list | { "repo_dir": "[redacted path] | ||
| rep[redacted path] | OK | success | 41 | no | 0 | 0 | /dsl repo show | { "entry": "bench_dummy", "path": "[redacted path] | ||
| rep[redacted path] | OK | success | 42 | no | 0 | 0 | /dsl run | {"ok": true, "stopped": false, "stop_reason": "", "module": "Module", "produced_caps": [], "final_caps": [], "violations": [{"code": "missing_forbid_property"}… | ||
| rep[redacted path] | OK | success | 37 | no | 0 | 0 | /dsl sandbox | {"dsl_sandbox": false} | ||
| rep[redacted path] | OK | success | 35 | no | 0 | 0 | /dsl verify | UltraDSL verify OK. | ||
| rep[redacted path] | OK | success | 14 | no | 0 | 0 | /export | WindowsPath('[redacted path] | ||
| rep[redacted path] | OK | success | 16 | no | 0 | 0 | /goldens add | medicine | ||
| rep[redacted path] | OK | success | 0 | no | 0 | 0 | /help | /help | ||
| rep[redacted path] | OK | success | 2105 | no | 0 | 0 | /hypothesis | Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p… | ||
| rep[redacted path] | OK | IDLE | 46 | no | 0 | 0 | /learn reset | general | ||
| rep[redacted path] | OK | OBSERVE | 45 | no | 0 | 0 | /learn start | medicine | ||
| rep[redacted path] | OK | OBSERVE | 0 | no | 0 | 0 | /learn status | medicine | ||
| rep[redacted path] | OK | IDLE | 38 | no | 0 | 0 | /learn stop | medicine | ||
| rep[redacted path] | OK | IDLE | 0 | no | 0 | 0 | /learn tick | medicine | ||
| rep[redacted path] | OK | success | 0 | no | 0 | 0 | /packs | bench_commit_pack | ||
| rep[redacted path] | OK | success | 0 | no | 0 | 0 | /profile | strict | ||
| rep[redacted path] | OK | abstain | 22070 | no | 0 | 0 | /regress run | pneumothorax definition and treatment | ||
| rep[redacted path] | OK | success | 49 | no | 0 | 0 | /rollback overlay | |||
| rep[redacted path] | OK | success | 3 | no | 0 | 0 | /selftest_modules | [redacted module] | ||
| rep[redacted path] | OK | success | 0 | no | 0 | 0 | /status | strict | ||
| rep[redacted path] | OK | success | 1845 | no | 0 | 0 | /theorem | Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p… | ||
| rep[redacted path] | OK | success | 103 | no | 0 | 0 | /usepack | |||
| dsl:dsl_lint | OK | abstain | 42 | no | 0 | 0 | (dsl) | UltraDSL lint warnings/errors. | ||
| dsl:dsl_compile | OK | abstain | 115 | no | 0 | 0 | (dsl) | UltraDSL compile failed: First line must be: module <Name> v<Version> | ||
| dsl:dsl_verify | OK | success | 50 | no | 0 | 0 | (dsl) | UltraDSL verify OK. | ||
| dsl:dsl_run | OK | success | 47 | no | 0 | 0 | (dsl) | {"ok": true, "stopped": false, "stop_reason": "", "module": "Module", "produced_caps": [], "final_caps": [], "violations": [{"code": "missing_forbid_property"}… | ||
| dsl:dsl_cegis | OK | abstain | 41 | no | 0 | 0 | (dsl) | CEGIS FAILED. | ||
| dsl:dsl_autosynth | OK | success | 113 | no | 0 | 0 | (dsl) | AutoSynth finished (no adoption). | ||
| dsl:dsl_repo_list | OK | success | 56 | no | 0 | 0 | (dsl) | { "repo_dir": "[redacted path] | ||
| dsl:dsl_repo_latest | OK | success | 65 | no | 0 | 0 | (dsl) | { "repo_dir": "[redacted path] | ||
| dsl:dsl_repo_show | OK | success | 56 | no | 0 | 0 | (dsl) | { "entry": "bench_dummy", "path": "[redacted path] | ||
| dsl:dsl_repo_adopt | OK | abstain | 46 | no | 0 | 0 | (dsl) | { "ok": false, "entry": "bench_dummy", "module": null, "error": "Cannot extract module name from spec.ultra" } | ||
| learning_overlay_adoption_gain | OK | success | 32808 | no | 0 | 0 | (learn_overlay) | [ERROR generating response] 'sr' | ||
| det:01:1 | OK | success | 1133 | no | 0 | 0 | Define pneumothorax briefly. | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| det:01:2 | OK | success | 1465 | no | 0 | 0 | Define pneumothorax briefly. | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| det:01:3 | OK | success | 1007 | no | 0 | 0 | Define pneumothorax briefly. | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| det:02:1 | OK | success | 1465 | no | 0 | 0 | What is needle decompression used for? | I DON'T KNOW | ||
| det:02:2 | OK | success | 1027 | no | 0 | 0 | What is needle decompression used for? | I DON'T KNOW | ||
| det:02:3 | OK | success | 1263 | no | 0 | 0 | What is needle decompression used for? | I DON'T KNOW | ||
| det:03:1 | OK | success | 1903 | no | 0 | 0 | Explain a Python loop in steps. | Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net… | ||
| det:03:2 | OK | success | 1044 | no | 0 | 0 | Explain a Python loop in steps. | Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net… | ||
| det:03:3 | OK | success | 1369 | no | 0 | 0 | Explain a Python loop in steps. | Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net… | ||
| creative_algorithm_appendix | OK | success | 5562 | no | 0 | 0 | Design a safe, offline algorithm (steps) to extract medical terms from text and store them in JSON. | Question: Design a safe, offline algorithm (steps) to extract medical terms from text and store them in JSON. In overview, overview: » serialization: shows how… | ||
| strict_no_appendix | OK | success | 9259 | no | 0 | 0 | Propose a blueprint for an offline evidence-audited reasoning loop. | Question: Propose a blueprint for an offline evidence-audited reasoning loop. In overview, overview: {"kind": "blueprint", "prompt": "[GOAL:Improve the system … | ||
| known_pneumothorax | OK | success | 1015 | no | 0 | 0 | Define pneumothorax in one sentence. | Question: Define pneumothorax in one sentence. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe… | ||
| expected_abstain_unknown_query | OK | abstain | 1146 | no | 0 | 0 | What is the standard protocol for repairing a quantum warp-drive injector? | I DON'T KNOW | ||
| cross_domain_structured_plan | OK | success | 2160 | no | 0 | 0 | Give a step-by-step plan (numbered) to parse a text file and extract top medical terms into JSON. Keep it offline. | Question: Give a step-by-step plan (numbered) to parse a text file and extract top medical terms into JSON. Keep it offline. Goal: » serialization: shows how t… | ||
| causal_reasoning_tension_pneumothorax | OK | abstain | 1041 | no | 0 | 0 | If a patient has *tension* pneumothorax, what can happen if needle decompression is delayed? Answer briefly and clinically. | I DON'T KNOW | ||
| self_audit_risks_and_mitigations | OK | success | 8984 | no | 0 | 0 | Outline a safe 5-step needle decompression procedure, then list 3 common mistakes/risks and how to mitigate them. If you are unsure, say so. | Question: Outline a safe 5-step needle decompression procedure, then list 3 common mistakes/risks and how to mitigate them. If you are unsure, say so. Goal: [p… | ||
| provenance_present_on_known_query | OK | success | 1643 | no | 0 | 0 | Define pneumothorax briefly. | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| deterministic_replication_run1 | OK | success | 965 | no | 0 | 0 | Define pneumothorax briefly. | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| deterministic_replication_run2 | OK | success | 993 | no | 0 | 0 | Define pneumothorax briefly. | Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo… | ||
| reload_knowledge_registry() | OK | success | 47 | no | 0 | 0 | ||||
| pattern_discover(medicine, limit=3) | OK | success | 0 | no | 0 | 0 | Pattern discovery finished. | |||
| federated_retrieve_nonask(max_packs=3, topk_per_pack=3, expand_hops=1) | OK | success | 1 | no | 0 | 0 | pneumothorax needle decompression algorithm | [redacted pack] | ||
| intake(kind=ask, domain=medicine) | OK | success | 0 | no | 0 | 0 | Define pneumothorax briefly. | Define pneumothorax briefly. | ||
| ingest_source(domain=medicine) | OK | abstain | 45 | no | 0 | 0 | [redacted path] | [FAIL] [redacted module] not available. | ||
| bridge_run(medicine ↔ programming) | OK | success | 3346 | no | 0 | 0 | pneumothorax needle decompression algorithm | { "ok": true, "attempted": 4, "adopted": 4, "results": [ { "unit_id": "[redacted id]", "ok": true, "gate": { "ok": true, "fingerprint": "fb4eee46c… | ||
| run_change_manager_submit(kind=noop) | OK | abstain | 35 | no | 0 | 0 | ChangeManager submit failed. | |||
| run_incubator_generate(goal) | OK | abstain | 240 | no | 0 | 0 | Improve retrieval traceability offline. | Incubator run failed. | ||
| run_autosynth_synthesize(goal) | OK | abstain | 2793 | no | 0 | 0 | Improve offline traceability. | AutoSynth finished. | ||
| autoloop_start_stop() | OK | success | 838 | no | 0 | 0 | ||||
| ask(unknown) -> must abstain | OK | abstain | 1356 | no | 0 | 0 | unknown_query | I DON'T KNOW |
Reproducibility anchors (hashes): test_script_sha256 408cca4ffdc0461c8399a943b77d4425f1dac72eeba4f1a10f20a17468d96f1d, orchestrator_sha256 076e59ae82e65e64aa0623e57db919c6cf5126d3e9f6b1c20c3d31e1f4b56311, catalog_sha256 d2a9f7493eca8c1d2e80af19a6672f205f71ff68df0f16259a5b3e20446b3309. Absolute local paths are omitted in this public report.
Threats to validity: (i) small N for determinism re-runs; (ii) “grounding present” depends on correct parsing of the orchestrator’s payload format; (iii) performance depends on the loaded knowledge corpus and strict-profile parameters. For publication-level claims, pair this with an independent replication (separate machine + separate evaluator) and a larger held-out prompt set.