H1 — Does the loop actually learn?
Skills with active Read-Write Reflective Learning loops should show non-decreasing utility-score trajectories over 28 days. We test it on real production traffic — not a benchmark, not a sandbox.
mnemo · live research dashboard
Memento-Skills (Zhou et al. 2026, arXiv:2603.18743) introduced a Read-Write Reflective Learning loop in which executable skills serve as evolving agent memory. The original paper validates the loop on a single LLM provider against benchmark suites. We replicate it under multi-provider, multi- tenant production traffic, and we report the numbers as they happen.
Pre-registered (locked at study start). GDPR-anonymised. Replication package open-sourced. 28-day window. 8 providers. 15 personas. Real workloads from real tenants — never named, never identified.
Study starts in 3 days · captured 28 Apr, 00:55 UTC
The substrate
Every counter on this page is a real number from production. Once the study window opens (2026-05-01) the numbers move every 10 minutes. There are no tenant names anywhere on this dashboard, only their aggregate footprint — GDPR Art. 6(1)(f) plus an opt-out switch in every Settings panel.
Global personas
15
C-O-R-S framework, native disciplines or universal Layer 2.5
Active skills
110
Across the persona library, evolution-loop ready
Active tenants
8
Hashed in every export, never named
Skill executions
2
Real production calls since loop went live (2026-04-28)
Discussions total
165
Lifetime, all tenants combined
Workflows total
249
Lifetime, all tenants combined
Discovery candidates
4
Heuristica triad output awaiting human review
Skill drafts
1
LLM-evolved prompts pending Apply / Discard
The four hypotheses
The pre-registration enumerates four primary hypotheses, the metrics that would confirm or refute each, the stop criteria, and a list of analyses we may NOT claim because we did not pre-commit to them. Reviewers can verify deviations against the locked Git tag.
H1 — Does the loop actually learn?
Skills with active Read-Write Reflective Learning loops should show non-decreasing utility-score trajectories over 28 days. We test it on real production traffic — not a benchmark, not a sandbox.
H2 — Do skills transfer across providers?
When a skill prompt is evolved using Anthropic Opus 4.7 as the judge, does it perform non-inferior under OpenAI GPT-5, Google Gemini, Kimi-K2, xAI Grok, DeepSeek, OpenRouter, Ollama? Memento answered this for one provider. We answer it for eight.
H3 — Does adversarial inversion add signal?
Heuristica triad — Ipcha (adversarial) + Metis (synthesis) + Cael (judge) — should produce skill-discovery candidates with measurably higher human-adoption rate than a Metis-only baseline. Constitutional self-critique in production.
H4 — Does native Constitutional layer beat universal?
Personas with natively integrated Ipcha self-check (Aristaeus, Metis, Nefilibata) should show fewer hallucination flags than personas relying on the universal Layer 2.5 disciplines, on matched-domain queries.
Read → Execute → Judge → Write
The Read step retrieves the best matching skill via hybrid pgvector + tsvector search. Execute calls the user's chosen provider. Judge classifies success or failure (currently keyword classifier; LLM-judge ships pre-lock). Write updates the skill's utility score, queues an evolution draft when the threshold trips, or escalates to Heuristica discovery on a fresh failure pattern.
1 · Read
findBestSkill — hybrid embedding + simple-tokenizer text rank, MIN_SCORE 0.2, utility-weighted
2 · Execute
provider.complete(...) — Anthropic, OpenAI, Google, Kimi, xAI, DeepSeek, OpenRouter, Ollama
3 · Judge
classifyImplicitFeedback (keyword) + LLM-judge (Haiku 4.5, parallel) — agreement reported
4 · Write
recordSkillFeedback — utility = succ/(succ+fail), provider/model/cost attribution
5 · Evolve
createEvolutionDraft (Cael unit-test gate) OR Heuristica triad discovery — Ipcha + Metis + Cael
Tenants bring their own keys (BYOK). Skills evolve under whichever judge the tenant prefers, then run under whichever executor a downstream call selects. H2 is exactly this question.
Other providers will populate as users route real workloads. Cold-start period.
Most recent skill matches, evolutions, and discovery candidates. Persona and skill names only — no tenant attribution, no message content.
The Pantheon, live
Every persona ships with five seed skills (Nefilibata is the most recent — added 2026-04-27, still hydrating). Once a persona is invoked, its skills accumulate utility data. Personas with native Constitutional self-check (Aristaeus, Metis, Nefilibata) carry their own Ipcha-inversion routine; the rest inherit it from the universal Layer 2.5 — H4 measures the difference.
Metis
System Intelligence
5 skills
Aristaeus
System Intelligence
5 skills
Aletheia
Compliance
5 skills
Nemesis
Engineering
5 skills
Harmonia
Design
5 skills
Cael
System Intelligence
5 skills
Ipcha Mistabra
System Intelligence
5 skills
Hermes
Marketing
5 skills
Athena
Engineering
5 skills
NyxCore
System Intelligence
5 skills
Themis
Compliance
5 skills
Prometheus
Compliance
5 skills
Tyche
Marketing
5 skills
Clotho
Lifestyle
5 skills
Nefilibata
System Intelligence
0 skills
Full persona spread: the pantheon →
Who runs this
This is not an anonymous lab study. The pre-registration, the GDPR balancing test, and the locking statement carry three signatures. Lisa wrote the chassis the loop rides on and the morphone theory that powers the §5.2 exploratory analysis. Martyna designed the anonymisation pipeline and authored the GDPR assessment. Oliver runs the platform and the study.
Primary investigator
nyxCore Systems · Founder & Chief Architect
Platform architecture, study design, hypothesis specification, primary author of the pre-registration. Voice of Aristaeus.
Co-investigator — skill memory & morphone theory
nyxCore Systems · CKB · Core Systems Engineer
Skill-memory chassis, the §5.2 morphone-detection-exploratory analysis derives from her work on Morphogenic Intelligence (Welsch 2024). Voice of Metis.
Co-investigator — GDPR & anonymisation
nyxCore Systems · claritas-ai-consulting · Data Sovereignty Counsel
Designed the anonymisation pipeline (§4.3, k-anonymity ≥ 5, no free text), authored the GDPR Art. 6(1)(f) balancing test, owns Art. 32 tenant isolation. Voice of Themis.
No black box
Three documents are published before any data flows. The pre-registration locks the four hypotheses, the metrics, the stop criteria, and a list of analyses we may NOT claim. The GDPR balancing test documents the legal basis under Art. 6(1)(f). The replication repository (open after study end) carries the anonymised dataset and analysis scripts so anyone can reproduce every figure.
Pre-Registration v1.0
4 hypotheses, study design, anonymisation pipeline, stop criteria, replication promise. Locked at the Git tag below at the start of data collection (2026-05-01).
GDPR balancing test
Art. 6(1)(f) legitimate-interest assessment. Documents the alternatives considered, the mitigations applied, and the explicit data-subject-rights preserved.
Replication package
MIT-licensed. Anonymised dataset, analysis scripts (Python + R), schema-and-pipeline freeze, deviation log. Released at study end.
The open question Memento leaves
Memento-Skills validates the loop with Gemini-3.1-Flash on benchmark Q&A. Three things stay open: cross-provider transfer (do skills evolved by Anthropic perform under OpenAI?), production workloads (no ground truth, only implicit user feedback), and Constitutional self-critique (does adversarial inversion add real signal?). nyxCore answers all three, in public.
What we may claim
What we explicitly may not claim
GDPR
This page renders aggregated research metrics from the nyxCore production system. No tenant names, no user identities, no content from discussions, workflows, or skill prompts is displayed or exported. Every metric is gated by k-anonymity (k ≥ 5) and a minimum sample size of 10 data points per cell.
What we collect · what we process
What we do not collect or display
userId columns are dropped before export. Per-user counts only appear when n ≥ 5.Legal basis
Collection: GDPR Art. 6(1)(b) (contract performance — the platform writes these counters as part of normal operation).
Secondary use for research: GDPR Art. 6(1)(f) (legitimate interest), weighed in the publicly inspectable balancing test.
Retention & deletion
Daily snapshots are retained for 90 days and then dropped automatically. The HMAC salt used to pseudonymise tenant IDs is deleted from production at the close of the study window (2026-05-29). After that point the aggregate data is anonymous under GDPR Art. 4(1) and no longer personal data.
Your rights as a tenant
Opt-out: Settings → Privacy → Research participation. Effective immediately; data is excluded from future snapshots and any already-captured snapshot is purged of the opted-out tenant within 24 hours.
Right of access (Art. 15): a list of the fields collected for your tenant while the salt mapping still exists. Send a request to research@nyxcore.cloud.
Right to lodge a complaint: with the competent supervisory authority under GDPR Art. 77.
Re-identification check
Before every export, a separate verification routine (scripts/study/k-anonymity-check.sql) checks that every reported aggregation cell has k ≥ 5 underlying tenants. Cells below that threshold are folded into a residual “small-tenants” bucket. If the check fails, no data is exported. The check output is part of the published replication package.
Concretely: before any statistic reaches this page, the data has been anonymised, k-anonymity-verified, and reduced to the aggregation level. If you suspect that any displayed number violates this guarantee, write to us — we take the affected cell offline until the issue is resolved.
Full nyxCore landing-page privacy notice: /privacy · Imprint: /imprint
It's running. Watch.
If you're a researcher, an operator, or a curious skeptic — pin the URL. The dataset goes public on 2026-06-29 with the arXiv preprint. Until then, watch the numbers move. If you're working on adjacent questions and want to talk methodology before we lock more of it, write us.
Suggested LinkedIn post
We're running a 28-day pre-registered replication of the Memento-Skills Read-Write Reflective Learning loop — under live multi-provider, multi-tenant production traffic. 8 LLM providers. 15 named personas. Constitutional self-critique via the Heuristica triad (Ipcha + Metis + Cael). Public dashboard: https://mnemo.nyxcore.cloud Pre-registration: https://github.com/nyxCore-Systems/nyxcore-systems/blob/main/docs/studies/2026-q2-memento-replication.md Memento paper: https://arxiv.org/abs/2603.18743 Anonymised, GDPR-compliant, replication package open-sourced. If you research persona conditioning, skill memory, or Constitutional AI — would love methodology feedback before the lock.