Cache¶
Unified model cache with LRU eviction and reference counting.
This module provides a production-ready cache for sharing models across extractors. Features: - LRU eviction with configurable max entries - Reference counting for automatic cleanup - Thread-safe operations - Memory-aware eviction (optional) - Context manager for scoped usage
Example
from omnidocs import clear_cache, get_cache_info, set_cache_config
from omnidocs.tasks.text_extraction import MinerUVLTextExtractor
from omnidocs.tasks.layout_extraction import MinerUVLLayoutDetector
# Configure cache (optional - defaults are sensible)
set_cache_config(max_entries=5)
# First extractor loads the model
text_extractor = MinerUVLTextExtractor(backend=config)
# Second extractor reuses cached model (instant)
layout_detector = MinerUVLLayoutDetector(backend=config)
# Check cache status
print(get_cache_info())
Reference Counting
When extractors are deleted, reference counts decrease. When count hits zero, the model becomes eligible for LRU eviction (but isn't immediately removed).
CacheEntry
dataclass
¶
CacheEntry(
value: Any,
created_at: float = time.time(),
last_accessed: float = time.time(),
access_count: int = 0,
ref_count: int = 0,
_weak_refs: Set[ref] = set(),
)
CacheConfig
dataclass
¶
CacheConfig(
max_entries: int = 10,
evict_unreferenced_first: bool = True,
auto_cleanup_interval: int = 100,
)
Cache configuration.
ModelCache
¶
Thread-safe LRU model cache with reference counting.
Source code in omnidocs/cache.py
configure
¶
get
¶
Get value from cache, updating LRU order.
Source code in omnidocs/cache.py
set
¶
Set value in cache with optional owner for reference counting.
Source code in omnidocs/cache.py
get_or_load
¶
Get from cache or load and cache.
Source code in omnidocs/cache.py
remove
¶
clear
¶
add_reference
¶
remove_reference
¶
info
¶
Get cache information.
Source code in omnidocs/cache.py
get_cache_key
¶
Generate a cache key from backend config.
The key is normalized to allow sharing between different extractors that use the same underlying model.
| PARAMETER | DESCRIPTION |
|---|---|
backend_config
|
Backend configuration object (must have model_dump() method)
|
prefix
|
Optional prefix for the key (e.g., model family name)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
String cache key |
Source code in omnidocs/cache.py
get_cached
¶
Get cached value if it exists.
| PARAMETER | DESCRIPTION |
|---|---|
cache_key
|
Cache key from get_cache_key()
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[Any]
|
Cached value or None if not cached |
set_cached
¶
Add value to cache.
| PARAMETER | DESCRIPTION |
|---|---|
cache_key
|
Cache key from get_cache_key()
TYPE:
|
value
|
Value to cache
TYPE:
|
owner
|
Optional owner object for reference counting (weak ref tracked)
TYPE:
|
Source code in omnidocs/cache.py
get_or_load
¶
Get from cache or load and cache.
Thread-safe operation that either returns cached value or loads a new one.
| PARAMETER | DESCRIPTION |
|---|---|
cache_key
|
Cache key from get_cache_key()
TYPE:
|
loader_fn
|
Function that loads and returns the value to cache
TYPE:
|
owner
|
Optional owner object for reference counting
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
Cached or newly loaded value |
Source code in omnidocs/cache.py
add_reference
¶
Add a reference to a cached entry.
Use this when an extractor starts using a cached model.
| PARAMETER | DESCRIPTION |
|---|---|
cache_key
|
Cache key
TYPE:
|
owner
|
Owner object (extractor instance)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if reference added, False if key not in cache |
Source code in omnidocs/cache.py
remove_cached
¶
Remove a specific entry from cache.
| PARAMETER | DESCRIPTION |
|---|---|
cache_key
|
Cache key to remove
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if entry was removed, False if it didn't exist |
clear_cache
¶
get_cache_info
¶
Get detailed cache information.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dict with cache stats, keys, and per-entry info |
list_cached_keys
¶
set_cache_config
¶
Configure global cache settings.
| PARAMETER | DESCRIPTION |
|---|---|
max_entries
|
Maximum number of cached models (0 = unlimited, default=10)
|
evict_unreferenced_first
|
Prefer evicting entries with no references (default=True)
|
auto_cleanup_interval
|
Cleanup dead refs every N operations (default=100)
|
Example
Source code in omnidocs/cache.py
get_cache_config
¶
Get current cache configuration.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dict with current config values |