Non-parametric self-evolving agent memory that applies runtime reinforcement learning on an episodic memory store. Instead of passive semantic matching (retrieve nearest neighbours and hope), MemRL uses environmental feedback signals to learn which past episode strategies are actually useful and promote them via a Two-Phase Retrieval mechanism — decoupling stable reasoning from the plastic memory. Agents improve from experience without weight updates or fine-tuning.
- Storage
- Episodic memory store: each past task is recorded as an episode containing the task context, the strategy applied, and the environmental reward signal. Episodes accumulate across runs (cross-session). The store is non-parametric — no model weights are modified; all learning is encoded in the updated utility scores of memory entries. Backed by a Python package under the `memrl` namespace.
- Retrieval
- Two-Phase Retrieval: Phase 1 retrieves candidate episodes by semantic similarity to the current task; Phase 2 filters candidates by their historically observed reward signal (utility), retaining only high-value strategies and discarding noisy ones. This RL-driven filter is the core innovation over standard RAG-style memory systems.
- Self-host
- Self-host: moderate
- License
- MIT
- Pricing
- Open-source MIT, free to self-host. Requires an LLM and embedding API (configured via YAML under `configs/`). No hosted cloud tier. · Free / OSS
- GitHub stars
- 135
- Last release
- —
- Last commit
- 2026-05-02
- First catalogued
- 2026-06-28
Strengths
- RL-driven retrieval filter eliminates noisy memory recall: only strategies with verified positive outcomes survive the Phase 2 filter
- Non-parametric: agents improve from experience without fine-tuning or weight updates — works with any frozen backbone LLM
- Evaluated on four diverse benchmarks (HLE, BigCodeBench, ALFWorld, Lifelong Agent Bench); paper at arXiv:2601.03192 (MIT)
- Standalone MIT-licensed package from MemTensor org; independent of MemOS (same org, different product)
Watch out
- Research codebase: 135 stars, no versioned releases, benchmark-runner orientation — production integration will require additional engineering
- RL signal quality depends on environment reward design; tasks without a clear verifiable reward are harder to apply MemRL to
- MemTensor also maintains MemOS (catalogued separately); ensure you're referencing the correct package (`memrl` vs `memos`)
Best for
- Research and agentic systems where agents repeatedly solve similar tasks and can provide environmental feedback (reward signals) to improve memory selection over time
Benchmark results
No sourced results yet.
Sources
- MemRL README (vendor)
- MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory (arXiv) (paper)
- GitHub API repo metadata (135 stars, MIT, no formal release) (third-party)
Last verified 2026-06-28 · updated by discover-frameworks