CLAUDE.md 15 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Claude Code Agent Usage

CRITICAL: Always use specialized agents to handle requests for professional results:

  • Use the python-pro agent for Python implementation tasks
  • Use the pytorch-model-architect agent for ML model development
  • Use the pytorch-audio-vision-expert agent for audio processing and training
  • Use the multimedia-preprocessing-expert agent for audio data preprocessing
  • Use the architect-reviewer agent after making architectural changes
  • Use the code-reviewer agent after writing significant code
  • Use appropriate specialized agents based on the task requirements

Project Overview

Trixy is a professional voice assistant system built with Python and PyTorch. It implements a server/client/standalone architecture with custom wakeword detection, voice recognition, and a comprehensive plugin system.

Key Project Terms

  • Client = Satellite: Remote voice input/output devices
  • Satellite Registered ≠ Satellite Connected: Registration is required before connection is allowed
  • Event-Driven Architecture: Everything operates through the central event handler system

System Architecture

Deployment Modes

  1. Server: Central hub with event handler, plugins, satellite management
  2. Client/Satellite: Wakeword detection, voice recognition, audio streaming only
  3. Standalone: Full system (server features) that can optionally connect to server for synchronization

Network Architecture

  • Command Socket (Port 2101): Custom Trixy protocol for commands
  • Raw Audio Input Stream (Port 2102): 16KHz Mono audio from satellites
  • Raw Audio Output Stream (Port 2103): 16KHz Mono TTS/response audio
  • Raw Music Output Stream (Port 2104): 48KHz Stereo music/media playback
  • Multi-Client Support: Multiple satellites can connect simultaneously

Core Components (Minimum - More Can Be Added)

Event Handler (./trixy_core/events/)

CENTRAL COMPONENT - Everything operates through events:

@TrixyEvent(["event_name", "other_event"])
def event_handler_method(self, event_name, event_data):
    # React to events
    pass

Network Component (./trixy_core/network/)

  • Custom Trixy protocol implementation
  • Command socket management
  • Audio streaming handlers
  • Serialization/deserialization of command classes

Satellite Manager (./trixy_core/satellites/)

  • Manages all registered satellites (connected or not)
  • Provides easy access: satellite_manager[0] or satellite_manager["status=connected,room=kitchen"]
  • Functions: say(), disconnect(), reconnect(), finder methods

Plugin System (./trixy_core/plugins/)

  • Dynamic loading from ./plugins/*/
  • Each plugin: main.py + config.json
  • Base class: TrixyPlugin
  • Access to application container and event system

Configuration Manager (./trixy_core/config/)

  • Handles server_config.json, client_config.json, standalone_config.json
  • Command line argument overrides

Asset Manager (./trixy_core/assets/)

  • Profile-based asset loading with fallback to default
  • Path resolution: ./assets/{profile}/path./assets/default/path

Scheduler (./trixy_core/scheduler/)

  • Multiple triggers: date, time, event, weekday
  • Multiple actions: trigger events, start ML training, call functions

Conversation Manager (./trixy_core/conversation/)

  • Session management for multi-turn conversations
  • Conversation ID tracking across events

Arbitration (./trixy_core/arbitration/)

  • Satellite selection (highest volume wins)
  • Multi-satellite wakeword conflict resolution

Event System Documentation

Core Events (Minimum Set)

Satellite Management Events

  • "satellite_connected" - When satellite establishes connection
  • "satellite_disconnected" - When satellite loses connection
  • "satellite_registered" - When new satellite is registered

Wakeword & Audio Events

  • "wakeword_received" - When satellite detects wakeword
    • Data: wakeword_id, speaker_id, speaker_name, satellite_id, volume
  • "raw_audio_input_received" - When audio recording completes
    • Data: conversation_id, audio_data, speaker_id, speaker_name, satellite_info

Processing Events

  • "text_received" - When STT converts audio to text
    • Data: conversation_id, text, confidence, speaker_info
  • "intent_received" - When NLP extracts intent from text
    • Data: conversation_id, intent, entities, confidence
  • "tts_received" - When TTS generates response audio
    • Data: conversation_id, audio_data, text, voice_settings

System Events

  • "system_startup" - System initialization complete
  • "system_shutdown" - System shutting down
  • "plugin_loaded" - Plugin successfully loaded
  • "plugin_unloaded" - Plugin unloaded
  • "training_started" - ML training begins
  • "training_completed" - ML training finishes
  • "schedule_triggered" - Scheduled event fires

Command Socket Protocol Specification

Protocol Overview

  • Not HTTP/REST API - Custom binary streaming protocol
  • Similar to ICQ/MSN/game protocols
  • Sends serialized Python classes

Message Structure

[4 bytes] Magic Number: "TRXI"
[4 bytes] Version Major (int)
[4 bytes] Version Minor (int) 
[4 bytes] Version Revision (int)
[8 bytes] DateTime (binary)
[4 bytes] Options (32-bit flags)
[16 bytes] MD5 Checksum
[4 bytes] Class Name Length
[variable] Class Name
[4 bytes] Serialized Data Length
[variable] Serialized Class Instance

Option Flags (32-bit)

  • Bit 1: GZ compressed
  • Bit 2: Encrypted
  • Bit 3: JSON format (else binary)
  • Bit 4: Requires "received" response
  • Bit 5: Base64 encoded
  • Bit 6: Multi-part message
  • Bit 7: Dictionary instead of class
  • Bit 8: Silent (no debug logging)

Hard-Coded Commands (Performance Optimization)

All start with "TRXI":

  • TRXINOOP - No-op/heartbeat
  • TRXIPING - Ping command
  • TRXIPONG - Pong response
  • TRXIPRNT <string> - Print string (testing)
  • TRXYHELO - Hello (debugging)

Command Classes Location

All serializable command classes stored in: ./trixy_core/network/cmd/*.py

Directory Structure Requirements

./                          # Application root
├── main.py                 # Entry point: python3 main.py [server|client|standalone]
├── trixy_core/            # Core system modules
│   ├── events/            # Event handler system
│   ├── network/           # Network & protocol implementation
│   │   └── cmd/           # Command classes for serialization
│   ├── satellites/        # Satellite management
│   ├── plugins/           # Plugin system
│   ├── config/            # Configuration management
│   ├── assets/            # Asset management
│   ├── scheduler/         # Schedule manager
│   ├── conversation/      # Conversation management
│   ├── arbitration/       # Multi-satellite arbitration
│   └── [additional components]
├── plugins/               # Plugin directory
│   └── {plugin_name}/     # Individual plugin folders
│       ├── main.py        # Plugin class (extends TrixyPlugin)
│       ├── config.json    # Plugin configuration
│       └── config_view.py # Optional: custom TUI config
├── models/                # ML model storage
│   ├── wakeword/          # Wakeword model subdirectories
│   └── voice_recognition/ # Voice recognition model subdirectories
├── config/                # Configuration files
│   ├── server_config.json
│   ├── client_config.json
│   └── standalone_config.json
├── assets/                # Profile-based assets
│   ├── default/           # Default asset profile
│   └── {profile_id}/      # Custom profiles
└── trainer/               # ML training system
    ├── data/              # Training data
    │   ├── wakeword/
    │   │   └── raw/       # Raw audio files
    │   └── voice_recognition/
    │       └── raw/       # Speaker audio files
    ├── wakeword/          # Wakeword trainer
    └── voice_recognition/ # Voice recognition trainer

Development Commands

Application Startup

# Server mode
python3 main.py server [--debug] [--config custom_config.json]

# Client/Satellite mode  
python3 main.py client [--debug] [--config custom_config.json]

# Standalone mode
python3 main.py standalone [--debug] [--config custom_config.json]

Debug Mode

  • --debug flag enables development mode
  • Production: Textual TUI + logging, no print() statements
  • Debug: No TUI, print() statements enabled
  • Uses pprint(str) function that adapts based on mode

ML Model Specifications

Model File Format

  • File Types: .pth, .pt, .onnx (all supported)
  • Protection: Password-protected archives
  • Metadata: Embedded JSON metadata

Metadata Storage Methods

  • .pth files: metafile.json in ZIP archive
  • .pt files: torch.package metadata
  • .onnx files: metadata_props

Wakeword Detection

  • Wakewords: "custom" + "system command" (admin operations)
  • Architecture: RepCNN
  • Input: 16kHz, 16-bit PCM mono
  • Features: Log-Mel-Spectrogram (20-40 filter banks, 10ms shift)
  • Data Augmentation: Background noise, silence, volume, pitch, speed

Voice Recognition

  • Purpose: Speaker identification (returns speaker name)
  • Dynamic Architecture: Supports 4-90+ speakers
  • Architectures: ECAPA-TDNN, TitaNet-S, SpeakerNet-M
  • Features: Log-Mel-Spectrogram or MFCC (40 filter banks, 25ms window, 10ms shift)
  • VAD: WebRTC VAD or Energy-based
  • Loss Functions: ArcFace, GE2E, Triplet Loss
  • Output: Speaker embeddings

Workflow Specifications

Satellite Registration Process

  1. Server enters "Registration Mode" (60-second timeout)
  2. Unknown satellite attempts connection
  3. Server requests: Room, Alias Name, MAC address
  4. Server creates registration file (JSON, MAC-based)
  5. Server exits registration mode

Satellite Connection Process

  1. Satellite auto-connects using config address/port
  2. Retry every 5 seconds on failure
  3. Satellite sends: room_id, alias, mac_address, version, audio_ports
  4. Server validates MAC (registered, not blacklisted)
  5. Server responds: ACCEPTED/DENIED + audio stream ports
  6. Server creates satellite instance in manager
  7. Audio streaming sockets established
  8. Events: "satellite_connected" or connection denied

Conversation Workflow

  1. Wakeword Detection (satellite):

    • Pause detection, start recording buffer
    • Send command: wakeword_id, speaker_id, speaker_name
    • 10-second timeout without server response = delete buffer
  2. Server Processing:

    • Wait 1 second for multiple satellite reports
    • Select highest volume satellite (arbitration)
    • Start conversation session
    • Send command with conversation_session_id
  3. Audio Recording (selected satellite):

    • Stream buffer + live audio (Raw Audio Input Stream)
    • Stop on: 3 seconds silence OR 60 seconds max OR server abort
    • Send recording completion command
  4. Server Event Chain:

    • Trigger: "raw_audio_input_received"
    • Plugin: STT → "text_received"
    • Plugin: NLP → "intent_received"
    • Plugin: Action + TTS → "tts_received"
    • Server: Send TTS audio via Raw Audio Output Stream
  5. Multi-turn Conversations:

    • Same conversation_id maintained
    • Plugins can ask questions and wait for responses
    • Intent handler tracks conversation state

User Interface (TUI) Specifications

Framework & Design

  • Framework: Textual with CSS styling
  • Navigation: F1-F9 keys for views
  • Title Format: "Trixy Server/Client/Standalone" + sub-view names
  • Scrollable: All content areas scrollable

Server Main Views

  • F1 (General): Status, satellite count, hostname, ports, plugins, uptime, version
  • F2 (Config): Editable server configuration
  • F3 (Satellites): List with connection status, registration mode toggle
  • F4 (Plugins): Plugin list with enable/disable, sub-views for details
  • F5 (Schedule): Schedule entries with last trigger times
  • F6 (ML Trainer): Training status, progress, controls
  • F9 (Logs): Scrollable log entries

Sub-Main Views (Escape to exit)

Detailed views for satellites, plugins, schedules, ML trainers with F1-F5 sub-tabs each.

Client Main Views

  • F1 (General): Client status, server connection info
  • F2 (Config): Client configuration
  • F3 (Wakeword): Model info, last detection, manual trigger

Plugin Development Requirements

Plugin Structure

# ./plugins/my_plugin/main.py
class MyPlugin(TrixyPlugin):
    # Auto-loaded properties:
    # - self.application (application container)
    # - self.config (loaded config.json)
    # - self.enabled (getter/setter)
    
    @TrixyEvent(["wakeword_received", "text_received"])
    def handle_events(self, event_name, event_data):
        # React to events
        pass
    
    def is_enabled(self):
        return self.enabled
    
    def reload_config(self):
        # Reload config.json
        pass
    
    def save_config(self):
        # Save config.json
        pass

Plugin Configuration

  • config.json: Auto-loaded into self.config
  • config_view.py: Optional custom TUI configuration
  • Enable/disable via config.json variable

Technical Implementation Guidelines

Satellite Manager Advanced Access

# Direct index access
satellite = satellite_manager[0]

# Query-based access (case insensitive)
satellites = satellite_manager["status=connected,room=kitchen"]

# Bulk operations on matching satellites
satellite_manager.disconnect_all("room=living_room")

Asset Management

# Asset resolution with fallback
asset_path = asset_manager.get_path("audio/success.wav")
# Tries: ./assets/{profile}/audio/success.wav
# Fallback: ./assets/default/audio/success.wav
# Returns: False if not found

Container & Factory Pattern

# Access through application container
event_handler = application.get_event_handler()
plugin_system = application.get_plugin_system()
satellite_manager = application.get_satellite_manager()

Current Implementation Status

⚠️ Project Phase: Planning/Design Complete - Implementation Required

Existing:

  • Comprehensive project specification
  • Training data structure
  • Audio samples for model training

Required Implementation:

  • All Python modules and core components
  • Event handler system with decorator support
  • Custom Trixy network protocol
  • Textual TUI framework integration
  • Plugin system with dynamic loading
  • ML training pipelines
  • Configuration management

Next Steps:

  1. Implement core event handler system
  2. Create network protocol and command classes
  3. Build satellite management system
  4. Develop plugin framework
  5. Implement Textual TUI
  6. Create ML training pipeline

Quality Standards

  • Event-Driven: Everything must use the event system
  • Plugin-First: Core functionality should be pluggable where possible
  • Professional Logging: No print() in production, use pprint() pattern
  • Robust Protocol: Handle network failures gracefully
  • Security: Password-protected models, input validation
  • Scalability: Support multiple satellites efficiently
  • Extensibility: Components can be added beyond minimum set