CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Claude Code Agent Usage

CRITICAL: Always use specialized agents to handle requests for professional results:

Use the python-pro agent for Python implementation tasks
Use the pytorch-model-architect agent for ML model development
Use the pytorch-audio-vision-expert agent for audio processing and training
Use the multimedia-preprocessing-expert agent for audio data preprocessing
Use the architect-reviewer agent after making architectural changes
Use the code-reviewer agent after writing significant code
Use appropriate specialized agents based on the task requirements

Project Overview

Trixy is a professional voice assistant system built with Python and PyTorch. It implements a server/client/standalone architecture with custom wakeword detection, voice recognition, and a comprehensive plugin system.

Key Project Terms

Client = Satellite: Remote voice input/output devices
Satellite Registered ≠ Satellite Connected: Registration is required before connection is allowed
Event-Driven Architecture: Everything operates through the central event handler system

System Architecture

Deployment Modes

Server: Central hub with event handler, plugins, satellite management
Client/Satellite: Wakeword detection, voice recognition, audio streaming only
Standalone: Full system (server features) that can optionally connect to server for synchronization

Network Architecture

Command Socket (Port 2101): Custom Trixy protocol for commands
Raw Audio Input Stream (Port 2102): 16KHz Mono audio from satellites
Raw Audio Output Stream (Port 2103): 16KHz Mono TTS/response audio
Raw Music Output Stream (Port 2104): 48KHz Stereo music/media playback
Multi-Client Support: Multiple satellites can connect simultaneously

Core Components (Minimum - More Can Be Added)

Event Handler (`./trixy_core/events/`)

CENTRAL COMPONENT - Everything operates through events:

@TrixyEvent(["event_name", "other_event"])
def event_handler_method(self, event_name, event_data):
    # React to events
    pass

Network Component (`./trixy_core/network/`)

Custom Trixy protocol implementation
Command socket management
Audio streaming handlers
Serialization/deserialization of command classes

Satellite Manager (`./trixy_core/satellites/`)

Manages all registered satellites (connected or not)
Provides easy access: satellite_manager[0] or satellite_manager["status=connected,room=kitchen"]
Functions: say(), disconnect(), reconnect(), finder methods

Plugin System (`./trixy_core/plugins/`)

Dynamic loading from ./plugins/*/
Each plugin: main.py + config.json
Base class: TrixyPlugin
Access to application container and event system

Configuration Manager (`./trixy_core/config/`)

Handles server_config.json, client_config.json, standalone_config.json
Command line argument overrides

Asset Manager (`./trixy_core/assets/`)

Profile-based asset loading with fallback to default
Path resolution: ./assets/{profile}/path → ./assets/default/path

Scheduler (`./trixy_core/scheduler/`)

Multiple triggers: date, time, event, weekday
Multiple actions: trigger events, start ML training, call functions

Conversation Manager (`./trixy_core/conversation/`)

Session management for multi-turn conversations
Conversation ID tracking across events

Arbitration (`./trixy_core/arbitration/`)

Satellite selection (highest volume wins)
Multi-satellite wakeword conflict resolution

Event System Documentation

Core Events (Minimum Set)

Satellite Management Events

"satellite_connected" - When satellite establishes connection
"satellite_disconnected" - When satellite loses connection
"satellite_registered" - When new satellite is registered

Wakeword & Audio Events

"wakeword_received" - When satellite detects wakeword
- Data: wakeword_id, speaker_id, speaker_name, satellite_id, volume
"raw_audio_input_received" - When audio recording completes
- Data: conversation_id, audio_data, speaker_id, speaker_name, satellite_info

Processing Events

"text_received" - When STT converts audio to text
- Data: conversation_id, text, confidence, speaker_info
"intent_received" - When NLP extracts intent from text
- Data: conversation_id, intent, entities, confidence
"tts_received" - When TTS generates response audio
- Data: conversation_id, audio_data, text, voice_settings

System Events

"system_startup" - System initialization complete
"system_shutdown" - System shutting down
"plugin_loaded" - Plugin successfully loaded
"plugin_unloaded" - Plugin unloaded
"training_started" - ML training begins
"training_completed" - ML training finishes
"schedule_triggered" - Scheduled event fires

Command Socket Protocol Specification

Protocol Overview

Not HTTP/REST API - Custom binary streaming protocol
Similar to ICQ/MSN/game protocols
Sends serialized Python classes

Message Structure

[4 bytes] Magic Number: "TRXI"
[4 bytes] Version Major (int)
[4 bytes] Version Minor (int) 
[4 bytes] Version Revision (int)
[8 bytes] DateTime (binary)
[4 bytes] Options (32-bit flags)
[16 bytes] MD5 Checksum
[4 bytes] Class Name Length
[variable] Class Name
[4 bytes] Serialized Data Length
[variable] Serialized Class Instance

Option Flags (32-bit)

Bit 1: GZ compressed
Bit 2: Encrypted
Bit 3: JSON format (else binary)
Bit 4: Requires "received" response
Bit 5: Base64 encoded
Bit 6: Multi-part message
Bit 7: Dictionary instead of class
Bit 8: Silent (no debug logging)

Hard-Coded Commands (Performance Optimization)

All start with "TRXI":

TRXINOOP - No-op/heartbeat
TRXIPING - Ping command
TRXIPONG - Pong response
TRXIPRNT <string> - Print string (testing)
TRXYHELO - Hello (debugging)

Command Classes Location

All serializable command classes stored in: ./trixy_core/network/cmd/*.py

Directory Structure Requirements

./                          # Application root
├── main.py                 # Entry point: python3 main.py [server|client|standalone]
├── trixy_core/            # Core system modules
│   ├── events/            # Event handler system
│   ├── network/           # Network & protocol implementation
│   │   └── cmd/           # Command classes for serialization
│   ├── satellites/        # Satellite management
│   ├── plugins/           # Plugin system
│   ├── config/            # Configuration management
│   ├── assets/            # Asset management
│   ├── scheduler/         # Schedule manager
│   ├── conversation/      # Conversation management
│   ├── arbitration/       # Multi-satellite arbitration
│   └── [additional components]
├── plugins/               # Plugin directory
│   └── {plugin_name}/     # Individual plugin folders
│       ├── main.py        # Plugin class (extends TrixyPlugin)
│       ├── config.json    # Plugin configuration
│       └── config_view.py # Optional: custom TUI config
├── models/                # ML model storage
│   ├── wakeword/          # Wakeword model subdirectories
│   └── voice_recognition/ # Voice recognition model subdirectories
├── config/                # Configuration files
│   ├── server_config.json
│   ├── client_config.json
│   └── standalone_config.json
├── assets/                # Profile-based assets
│   ├── default/           # Default asset profile
│   └── {profile_id}/      # Custom profiles
└── trainer/               # ML training system
    ├── data/              # Training data
    │   ├── wakeword/
    │   │   └── raw/       # Raw audio files
    │   └── voice_recognition/
    │       └── raw/       # Speaker audio files
    ├── wakeword/          # Wakeword trainer
    └── voice_recognition/ # Voice recognition trainer

Development Commands

Application Startup

# Server mode
python3 main.py server [--debug] [--config custom_config.json]

# Client/Satellite mode  
python3 main.py client [--debug] [--config custom_config.json]

# Standalone mode
python3 main.py standalone [--debug] [--config custom_config.json]

Debug Mode

--debug flag enables development mode
Production: Textual TUI + logging, no print() statements
Debug: No TUI, print() statements enabled
Uses pprint(str) function that adapts based on mode

ML Model Specifications

Model File Format

File Types: .pth, .pt, .onnx (all supported)
Protection: Password-protected archives
Metadata: Embedded JSON metadata

Metadata Storage Methods

.pth files: metafile.json in ZIP archive
.pt files: torch.package metadata
.onnx files: metadata_props

Wakeword Detection

Wakewords: "custom" + "system command" (admin operations)
Architecture: RepCNN
Input: 16kHz, 16-bit PCM mono
Features: Log-Mel-Spectrogram (20-40 filter banks, 10ms shift)
Data Augmentation: Background noise, silence, volume, pitch, speed

Voice Recognition

Purpose: Speaker identification (returns speaker name)
Dynamic Architecture: Supports 4-90+ speakers
Architectures: ECAPA-TDNN, TitaNet-S, SpeakerNet-M
Features: Log-Mel-Spectrogram or MFCC (40 filter banks, 25ms window, 10ms shift)
VAD: WebRTC VAD or Energy-based
Loss Functions: ArcFace, GE2E, Triplet Loss
Output: Speaker embeddings

Workflow Specifications

Satellite Registration Process

Server enters "Registration Mode" (60-second timeout)
Unknown satellite attempts connection
Server requests: Room, Alias Name, MAC address
Server creates registration file (JSON, MAC-based)
Server exits registration mode

Satellite Connection Process

Satellite auto-connects using config address/port
Retry every 5 seconds on failure
Satellite sends: room_id, alias, mac_address, version, audio_ports
Server validates MAC (registered, not blacklisted)
Server responds: ACCEPTED/DENIED + audio stream ports
Server creates satellite instance in manager
Audio streaming sockets established
Events: "satellite_connected" or connection denied

Conversation Workflow

Wakeword Detection (satellite):
- Pause detection, start recording buffer
- Send command: wakeword_id, speaker_id, speaker_name
- 10-second timeout without server response = delete buffer
Server Processing:
- Wait 1 second for multiple satellite reports
- Select highest volume satellite (arbitration)
- Start conversation session
- Send command with conversation_session_id
Audio Recording (selected satellite):
- Stream buffer + live audio (Raw Audio Input Stream)
- Stop on: 3 seconds silence OR 60 seconds max OR server abort
- Send recording completion command
Server Event Chain:
- Trigger: "raw_audio_input_received"
- Plugin: STT → "text_received"
- Plugin: NLP → "intent_received"
- Plugin: Action + TTS → "tts_received"
- Server: Send TTS audio via Raw Audio Output Stream
Multi-turn Conversations:
- Same conversation_id maintained
- Plugins can ask questions and wait for responses
- Intent handler tracks conversation state

User Interface (TUI) Specifications

Framework & Design

Framework: Textual with CSS styling
Navigation: F1-F9 keys for views
Title Format: "Trixy Server/Client/Standalone" + sub-view names
Scrollable: All content areas scrollable

Server Main Views

F1 (General): Status, satellite count, hostname, ports, plugins, uptime, version
F2 (Config): Editable server configuration
F3 (Satellites): List with connection status, registration mode toggle
F4 (Plugins): Plugin list with enable/disable, sub-views for details
F5 (Schedule): Schedule entries with last trigger times
F6 (ML Trainer): Training status, progress, controls
F9 (Logs): Scrollable log entries

Sub-Main Views (Escape to exit)

Detailed views for satellites, plugins, schedules, ML trainers with F1-F5 sub-tabs each.

Client Main Views

F1 (General): Client status, server connection info
F2 (Config): Client configuration
F3 (Wakeword): Model info, last detection, manual trigger

Plugin Development Requirements

Plugin Structure

# ./plugins/my_plugin/main.py
class MyPlugin(TrixyPlugin):
    # Auto-loaded properties:
    # - self.application (application container)
    # - self.config (loaded config.json)
    # - self.enabled (getter/setter)
    
    @TrixyEvent(["wakeword_received", "text_received"])
    def handle_events(self, event_name, event_data):
        # React to events
        pass
    
    def is_enabled(self):
        return self.enabled
    
    def reload_config(self):
        # Reload config.json
        pass
    
    def save_config(self):
        # Save config.json
        pass

Plugin Configuration

config.json: Auto-loaded into self.config
config_view.py: Optional custom TUI configuration
Enable/disable via config.json variable

Technical Implementation Guidelines

Satellite Manager Advanced Access

# Direct index access
satellite = satellite_manager[0]

# Query-based access (case insensitive)
satellites = satellite_manager["status=connected,room=kitchen"]

# Bulk operations on matching satellites
satellite_manager.disconnect_all("room=living_room")

Asset Management

# Asset resolution with fallback
asset_path = asset_manager.get_path("audio/success.wav")
# Tries: ./assets/{profile}/audio/success.wav
# Fallback: ./assets/default/audio/success.wav
# Returns: False if not found

Container & Factory Pattern

# Access through application container
event_handler = application.get_event_handler()
plugin_system = application.get_plugin_system()
satellite_manager = application.get_satellite_manager()

Current Implementation Status

⚠️ Project Phase: Planning/Design Complete - Implementation Required

Existing:

Comprehensive project specification
Training data structure
Audio samples for model training

Required Implementation:

All Python modules and core components
Event handler system with decorator support
Custom Trixy network protocol
Textual TUI framework integration
Plugin system with dynamic loading
ML training pipelines
Configuration management

Next Steps:

Implement core event handler system
Create network protocol and command classes
Build satellite management system
Develop plugin framework
Implement Textual TUI
Create ML training pipeline

Quality Standards

Event-Driven: Everything must use the event system
Plugin-First: Core functionality should be pluggable where possible
Professional Logging: No print() in production, use pprint() pattern
Robust Protocol: Handle network failures gracefully
Security: Password-protected models, input validation
Scalability: Support multiple satellites efficiently
Extensibility: Components can be added beyond minimum set

CLAUDE.md 15 KB Permalink History Raw