MemRL

Name: MemRL
Author: MemTensor

MemTensor

Non-parametric self-evolving agent memory that applies runtime reinforcement learning on an episodic memory store. Instead of passive semantic matching (retrieve nearest neighbours and hope), MemRL uses environmental feedback signals to learn which past episode strategies are actually useful and promote them via a Two-Phase Retrieval mechanism — decoupling stable reasoning from the plastic memory. Agents improve from experience without weight updates or fine-tuning.

Storage: Episodic memory store: each past task is recorded as an episode containing the task context, the strategy applied, and the environmental reward signal. Episodes accumulate across runs (cross-session). The store is non-parametric — no model weights are modified; all learning is encoded in the updated utility scores of memory entries. Backed by a Python package under the `memrl` namespace.
Retrieval: Two-Phase Retrieval: Phase 1 retrieves candidate episodes by semantic similarity to the current task; Phase 2 filters candidates by their historically observed reward signal (utility), retaining only high-value strategies and discarding noisy ones. This RL-driven filter is the core innovation over standard RAG-style memory systems.
Self-host: Self-host: moderate
License: MIT
Pricing: Open-source MIT, free to self-host. Requires an LLM and embedding API (configured via YAML under `configs/`). No hosted cloud tier. · Free / OSS
GitHub stars: 135
Last release: —
Last commit: 2026-05-02
First catalogued: 2026-06-28

Strengths

RL-driven retrieval filter eliminates noisy memory recall: only strategies with verified positive outcomes survive the Phase 2 filter
Non-parametric: agents improve from experience without fine-tuning or weight updates — works with any frozen backbone LLM
Evaluated on four diverse benchmarks (HLE, BigCodeBench, ALFWorld, Lifelong Agent Bench); paper at arXiv:2601.03192 (MIT)
Standalone MIT-licensed package from MemTensor org; independent of MemOS (same org, different product)

Watch out

Research codebase: 135 stars, no versioned releases, benchmark-runner orientation — production integration will require additional engineering
RL signal quality depends on environment reward design; tasks without a clear verifiable reward are harder to apply MemRL to
MemTensor also maintains MemOS (catalogued separately); ensure you're referencing the correct package (`memrl` vs `memos`)

Best for

Research and agentic systems where agents repeatedly solve similar tasks and can provide environmental feedback (reward signals) to improve memory selection over time

Benchmark results

No sourced results yet.

Sources

MemRL README (vendor)
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory (arXiv) (paper)
GitHub API repo metadata (135 stars, MIT, no formal release) (third-party)

Last verified 2026-06-28 · updated by discover-frameworks