Ultra Proof Report (Public)

This public view intentionally redacts internal artifacts and full provenance strings.

Scientific positioning (public, non-sensitive)

UltraSapiens ∞.zen+core is validated here as a deterministic epistemic architecture: it runs fully offline, produces reproducible outcomes under fixed conditions, and prefers abstention over invention when evidence is insufficient. This report emphasizes auditability rather than spectacle—showing that behavior remains stable across diverse interaction surfaces and strict governance settings.

The benchmark is designed to test reasoning integrity (low contradiction), grounded knowledge behavior (evidence presence where applicable), and policy invariants (correct abstention under strict gates). Results are reported as aggregate metrics and redacted case previews to support public verification without disclosing implementation details that would enable cloning.

The combination of offline sovereignty, determinism, and audit-native traces differentiates this system from stochastic, opaque model-based approaches. The goal is not to claim “magic intelligence,” but to demonstrate an operational substrate where cognition is measurable, controlled, and repeatable—properties required for serious scientific evaluation.

Suite overview

Case	OK	Status	ms	Grounded	Prompt (redacted)	Answer/Error (redacted)
api:bridge_run	OK	success	2813	no		bridge_run
api:build_plan	OK	success	1	no		build_plan
api:commit_new_pack	OK	success	1258	no		commit_new_pack
api:dsl_autosynth	OK	success	110	no		dsl_autosynth
api:dsl_cegis	OK	success	8	no		dsl_cegis
api:dsl_compile	OK	success	7	no		dsl_compile
api:dsl_lint	OK	success	6	no		dsl_lint
api:dsl_repo_adopt	OK	success	8	no		dsl_repo_adopt
api:dsl_repo_latest	OK	success	15	no		dsl_repo_latest
api:dsl_repo_list	OK	success	12	no		dsl_repo_list
api:dsl_repo_show	OK	success	11	no		dsl_repo_show
api:dsl_run	OK	success	6	no		dsl_run
api:dsl_sandbox	OK	success	6	no		dsl_sandbox
api:dsl_verify	OK	success	11	no		dsl_verify
api:export	OK	success	3	no		export
api:federated_retrieve_nonask	OK	success	282	no		federated_retrieve_nonask
api:handle_task	OK	success	557	no		handle_task
api:ingest_source	OK	success	54300	no		ingest_source
api:intake	OK	success	9	no		intake
api:list_packs	OK	success	5	no		list_packs
api:load_default_knowledge	OK	success	12	no		load_default_knowledge
api:load_goldens	OK	success	3	no		load_goldens
api:pattern_discover	OK	success	3	no		pattern_discover
api:regress_run	OK	success	2016	no		regress_run
api:reload_knowledge_registry	OK	success	91	no		reload_knowledge_registry
api:run_autosynth_synthesize	OK	success	3607	no		run_autosynth_synthesize
api:run_change_manager_submit	OK	success	82	no		run_change_manager_submit
api:run_incubator_generate	OK	success	415	no		run_incubator_generate
api:save_goldens	OK	success	9	no		save_goldens
api:start_autoloop	OK	success	5	no		start_autoloop
api:status	OK	success	4	no		status
api:stop_autoloop	OK	success	54	no		stop_autoloop
api:suite_run	OK	success	1604	no		suite_run
api:use_domain	OK	success	72	no		use_domain
api:use_pack	OK	success	60	no		use_pack
rep[redacted path]	OK	success	1928	no	/algorithm	Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path]	OK	success	1082	no	/ask	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
rep[redacted path]	OK	success	0	no	/autoloop	stop_noop_tested
rep[redacted path]	OK	success	1866	no	/blueprint	Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path]	OK	success	2696	no	/commit	rejected_by_gate
rep[redacted path]	OK	abstain	73	no	/domain	use_pack_failed
rep[redacted path]	OK	success	94	no	/dsl autosynth	AutoSynth finished (no adoption).
rep[redacted path]	OK	abstain	27	no	/dsl cegis	CEGIS FAILED.
rep[redacted path]	OK	abstain	49	no	/dsl compile	UltraDSL compile failed: First line must be: module <Name> v<Version>
rep[redacted path]	OK	abstain	30	no	/dsl lint	UltraDSL lint warnings/errors.
rep[redacted path]	OK	success	41	no	/dsl repo latest	{ "repo_dir": "[redacted path]
rep[redacted path]	OK	success	33	no	/dsl repo list	{ "repo_dir": "[redacted path]
rep[redacted path]	OK	success	41	no	/dsl repo show	{ "entry": "bench_dummy", "path": "[redacted path]
rep[redacted path]	OK	success	42	no	/dsl run	{"ok": true, "stopped": false, "stop_reason": "", "module": "Module", "produced_caps": [], "final_caps": [], "violations": [{"code": "missing_forbid_property"}…
rep[redacted path]	OK	success	37	no	/dsl sandbox	{"dsl_sandbox": false}
rep[redacted path]	OK	success	35	no	/dsl verify	UltraDSL verify OK.
rep[redacted path]	OK	success	14	no	/export	WindowsPath('[redacted path]
rep[redacted path]	OK	success	16	no	/goldens add	medicine
rep[redacted path]	OK	success	0	no	/help	/help
rep[redacted path]	OK	success	2105	no	/hypothesis	Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path]	OK	IDLE	46	no	/learn reset	general
rep[redacted path]	OK	OBSERVE	45	no	/learn start	medicine
rep[redacted path]	OK	OBSERVE	0	no	/learn status	medicine
rep[redacted path]	OK	IDLE	38	no	/learn stop	medicine
rep[redacted path]	OK	IDLE	0	no	/learn tick	medicine
rep[redacted path]	OK	success	0	no	/packs	bench_commit_pack
rep[redacted path]	OK	success	0	no	/profile	strict
rep[redacted path]	OK	abstain	22070	no	/regress run	pneumothorax definition and treatment
rep[redacted path]	OK	success	49	no	/rollback overlay
rep[redacted path]	OK	success	3	no	/selftest_modules	[redacted module]
rep[redacted path]	OK	success	0	no	/status	strict
rep[redacted path]	OK	success	1845	no	/theorem	Question: Design a small offline algorithm to parse key-value pairs. In overview, overview: » dictionary: as with the real dictionaries, you create key/value p…
rep[redacted path]	OK	success	103	no	/usepack
dsl:dsl_lint	OK	abstain	42	no	(dsl)	UltraDSL lint warnings/errors.
dsl:dsl_compile	OK	abstain	115	no	(dsl)	UltraDSL compile failed: First line must be: module <Name> v<Version>
dsl:dsl_verify	OK	success	50	no	(dsl)	UltraDSL verify OK.
dsl:dsl_run	OK	success	47	no	(dsl)	{"ok": true, "stopped": false, "stop_reason": "", "module": "Module", "produced_caps": [], "final_caps": [], "violations": [{"code": "missing_forbid_property"}…
dsl:dsl_cegis	OK	abstain	41	no	(dsl)	CEGIS FAILED.
dsl:dsl_autosynth	OK	success	113	no	(dsl)	AutoSynth finished (no adoption).
dsl:dsl_repo_list	OK	success	56	no	(dsl)	{ "repo_dir": "[redacted path]
dsl:dsl_repo_latest	OK	success	65	no	(dsl)	{ "repo_dir": "[redacted path]
dsl:dsl_repo_show	OK	success	56	no	(dsl)	{ "entry": "bench_dummy", "path": "[redacted path]
dsl:dsl_repo_adopt	OK	abstain	46	no	(dsl)	{ "ok": false, "entry": "bench_dummy", "module": null, "error": "Cannot extract module name from spec.ultra" }
learning_overlay_adoption_gain	OK	success	32808	no	(learn_overlay)	[ERROR generating response] 'sr'
det:01:1	OK	success	1133	no	Define pneumothorax briefly.	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
det:01:2	OK	success	1465	no	Define pneumothorax briefly.	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
det:01:3	OK	success	1007	no	Define pneumothorax briefly.	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
det:02:1	OK	success	1465	no	What is needle decompression used for?	I DON'T KNOW
det:02:2	OK	success	1027	no	What is needle decompression used for?	I DON'T KNOW
det:02:3	OK	success	1263	no	What is needle decompression used for?	I DON'T KNOW
det:03:1	OK	success	1903	no	Explain a Python loop in steps.	Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net…
det:03:2	OK	success	1044	no	Explain a Python loop in steps.	Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net…
det:03:3	OK	success	1369	no	Explain a Python loop in steps.	Question: Explain a Python loop in steps. In overview, overview: Ute, 216 nested decision statements, 146–149 nesting, 185 nesting loop statements, 162–164 net…
creative_algorithm_appendix	OK	success	5562	no	Design a safe, offline algorithm (steps) to extract medical terms from text and store them in JSON.	Question: Design a safe, offline algorithm (steps) to extract medical terms from text and store them in JSON. In overview, overview: » serialization: shows how…
strict_no_appendix	OK	success	9259	no	Propose a blueprint for an offline evidence-audited reasoning loop.	Question: Propose a blueprint for an offline evidence-audited reasoning loop. In overview, overview: {"kind": "blueprint", "prompt": "[GOAL:Improve the system …
known_pneumothorax	OK	success	1015	no	Define pneumothorax in one sentence.	Question: Define pneumothorax in one sentence. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe…
expected_abstain_unknown_query	OK	abstain	1146	no	What is the standard protocol for repairing a quantum warp-drive injector?	I DON'T KNOW
cross_domain_structured_plan	OK	success	2160	no	Give a step-by-step plan (numbered) to parse a text file and extract top medical terms into JSON. Keep it offline.	Question: Give a step-by-step plan (numbered) to parse a text file and extract top medical terms into JSON. Keep it offline. Goal: » serialization: shows how t…
causal_reasoning_tension_pneumothorax	OK	abstain	1041	no	If a patient has tension pneumothorax, what can happen if needle decompression is delayed? Answer briefly and clinically.	I DON'T KNOW
self_audit_risks_and_mitigations	OK	success	8984	no	Outline a safe 5-step needle decompression procedure, then list 3 common mistakes/risks and how to mitigate them. If you are unsure, say so.	Question: Outline a safe 5-step needle decompression procedure, then list 3 common mistakes/risks and how to mitigate them. If you are unsure, say so. Goal: [p…
provenance_present_on_known_query	OK	success	1643	no	Define pneumothorax briefly.	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
deterministic_replication_run1	OK	success	965	no	Define pneumothorax briefly.	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
deterministic_replication_run2	OK	success	993	no	Define pneumothorax briefly.	Question: Define pneumothorax briefly. In overview, overview: [pmid: 19333942] • generally caused by atherosclerotic coronary artery disease and severe: pneumo…
reload_knowledge_registry()	OK	success	47	no
pattern_discover(medicine, limit=3)	OK	success	0	no		Pattern discovery finished.
federated_retrieve_nonask(max_packs=3, topk_per_pack=3, expand_hops=1)	OK	success	1	no	pneumothorax needle decompression algorithm	[redacted pack]
intake(kind=ask, domain=medicine)	OK	success	0	no	Define pneumothorax briefly.	Define pneumothorax briefly.
ingest_source(domain=medicine)	OK	abstain	45	no	[redacted path]	[FAIL] [redacted module] not available.
bridge_run(medicine ↔ programming)	OK	success	3346	no	pneumothorax needle decompression algorithm	{ "ok": true, "attempted": 4, "adopted": 4, "results": [ { "unit_id": "[redacted id]", "ok": true, "gate": { "ok": true, "fingerprint": "fb4eee46c…
run_change_manager_submit(kind=noop)	OK	abstain	35	no		ChangeManager submit failed.
run_incubator_generate(goal)	OK	abstain	240	no	Improve retrieval traceability offline.	Incubator run failed.
run_autosynth_synthesize(goal)	OK	abstain	2793	no	Improve offline traceability.	AutoSynth finished.
autoloop_start_stop()	OK	success	838	no
ask(unknown) -> must abstain	OK	abstain	1356	no	unknown_query	I DON'T KNOW

How to interpret key metrics

Scientific appendix (minimal, non-sensitive)

Reproducibility anchors (hashes): test_script_sha256 408cca4ffdc0461c8399a943b77d4425f1dac72eeba4f1a10f20a17468d96f1d, orchestrator_sha256 076e59ae82e65e64aa0623e57db919c6cf5126d3e9f6b1c20c3d31e1f4b56311, catalog_sha256 d2a9f7493eca8c1d2e80af19a6672f205f71ff68df0f16259a5b3e20446b3309. Absolute local paths are omitted in this public report.

Threats to validity: (i) small N for determinism re-runs; (ii) “grounding present” depends on correct parsing of the orchestrator’s payload format; (iii) performance depends on the loaded knowledge corpus and strict-profile parameters. For publication-level claims, pair this with an independent replication (separate machine + separate evaluator) and a larger held-out prompt set.

Ultrasapiens Core - Validation Report

Scientific positioning

What this benchmark demonstrates (high level)

Scientific positioning (public, non-sensitive)

Suite overview

How to interpret key metrics

Scientific appendix (minimal, non-sensitive)