LogoLogo
CommunityBlogLicense
latest
latest
  • Overview
  • Social Registry
    • Features
      • Individuals and Households
        • 📔User Guides
          • 📔Create an Individual Registrant
          • 📔Create a Group and Add Individual Registrants to the Group
          • 📔Import CSV file to Social Registry
      • Offline Capabilities
        • ODK Importer
          • 📔User Guide
            • 📔Configure and Import ODK Form
            • 📔Import Specific ODK Forms using ODK Instance ID
        • Enumerator ID
      • Online Self Registration
      • Online Assisted Registration
        • 📔User Guides
          • 📔Create a New Household
          • 📔Create a New Individual in Registration Portal
          • 📔Create a New Portal User
          • 📔Configure Portal User to Limit Accessing Location
      • Deduplication
        • 📔User Guides
          • 📔Configure ID Deduplication, Deduplicate, and Save Duplicate Groups/Individuals
        • Deduplicator Service
      • Locking of Records
      • Dynamic Registry
      • Document Storage
      • Configurability
        • 📔User Guide
          • 📔Configure ID Types
          • 📔Configure Registrant Tags
          • 📔Configure Gender Types
          • 📔Configure Relationships
          • 📔Configure Group Types
          • 📔Configure Group Membership Kind
      • Role Based Access Control
        • 📔User Guide
          • 📔Create User
          • 📔Assign a Role to a User
      • Geo Targeting
      • Data Sharing
      • Multi-language Support
        • 📔User Guides
          • 📔Set Language Preference
      • Privacy and Security
      • Interoperability
      • Real-time Reporting
      • Monitoring and Reporting
      • ID Integration
        • ID Validation and Tokenisation
        • ID Authentication
          • 📔User Guides
            • 📔Configure eSignet Auth Provider for ID Authentication
            • 📔ID Authentication Process
            • 📔eSignet Client Creation
        • eSignet Integration
        • Fayda ID Integration
      • Verifiable Credentials Issuance
        • 📔User Guides
          • 📔Configure Inji to download Social Registry VCs
      • Automatic Computation of PMT
      • Record Revision History
      • SPAR Integration for Account Info
      • Unique Social ID
      • Audit Logs
      • Rapid Deployment Framework
      • Performance & Scale
      • Draft and Publish
      • Claim and Attest
    • Versions
    • Deployment
      • Domain names and Certificates
      • Install Odoo Modules
      • Packaging
        • 📘Docker Packaging Guide
        • 📘Helm Packaging Guide
    • Developer Zone
      • Technology Stack
      • API Reference
        • Search APIs
        • Individual APIs
        • Group APIs
      • Repositories
      • Background Tasks
      • Developer Install
        • 📘Developer Install of OpenG2P Package on Linux
      • Design Notes
        • Data Sharing
      • Odoo Modules
        • G2P Registry Datashare: RabbitMQ
        • ODK App User Mapping
      • Performance Testing
  • PBMS
    • Features
      • Program Management
        • Role of a Program Manager
        • Program Life Cycle
      • Program Disbursement Cycles
        • 📔User Guides
          • 📔Create Program Fund
          • 📔Create Cycle Manager for a Program
      • Beneficiary Management
        • Beneficiary Registry
          • 📔User Guides
            • 📔Create an Individual Registrant
            • 📔Create a Group and Add Individual Registrants to the Group
            • 📔Assign a Program to a Group
            • 📔Assign a Program to an Individual
        • Beneficiary Registry Configurations
          • 📔User Guides
            • 📔Configure ID Types
            • 📔Configure Registrant Tags
            • 📔Configure Gender Types
            • 📔Configure Relationships
            • 📔Configure Group Types
            • 📔Configure Group Membership Kind
        • Registration
          • 📔User Guides
            • 📔Import CSV File to Registry Module
      • ID Verification
      • Eligibility
        • Proxy Means Test
        • 📔User Guides
          • 📔Create Eligibility Manager Types
            • 📔Configure Default Eligibility Manager
            • 📔Create ID Document Eligibility Manager
            • 📔Create Phone Number Eligibility Manager
          • 📔Configure Proxy Means Test
          • 📔Verify Eligibility of Enrolled Registrants
      • Deduplication
        • 📔User Guides
          • 📔Deduplicate Registrants
          • 📔Create Deduplication Manager Types
            • 📔Configure Default Deduplication Manager
            • 📔Create ID Deduplication Manager
            • 📔Create Phone Number Deduplication
      • Enrolment
        • 📔User Guides
          • 📔Enroll Registrants into Program
          • 📔Auto-Enroll New Registrants into a Program
          • 📔Enroll Eligible Individual Registrants into a Program
      • Entitlement
        • 📔User Guides
          • 📔Multi-Stage Approval
          • 📔Create Entitlement Manager Type
            • 📔Create Default Entitlement Manager
            • 📔Create Voucher Entitlement Manager
            • 📔Configure Cash Entitlement Manager
          • 📔Create Entitlement Voucher Template
          • 📔Configure the Payments File with QR Code
          • 📔Configure Default Cycle Managers
          • 📔Export Beneficiaries Approved Entitlement
      • Disbursement
        • Payment Batches
        • In-Kind Transfer
          • 📔User Guides
            • 📔Create a Product in Inventory
            • 📔Configure In-Kind Entitlement Manager
            • 📔Create and Approve Program Cycle
            • 📔Verify Eligibility of Registrants in a Cycle
        • Digital Cash Transfer
        • e-Voucher
        • 📔User Guides
          • Prepare and Send Payment
      • Self Service Portal
        • 📔User Guides
          • 📔Create Form and Map with Program
          • 📔Configure Login Providers for Beneficiary Portal
          • 📔Self Register Online
      • Document Management
      • Multi-tenancy
      • Notifications
        • 📔User Guides
          • 📔Send Notifications to Individual Registrants
          • 📔Create Notification Manager Types
            • 📔Create SMS Notification Manager
            • 📔Create Email Notification Manager
            • 📔Create Fast2SMS Notification Manager
          • 📔Create Notification Manager under Program
      • Accounting
      • Administration
        • RBAC
          • 📔User Guides
            • 📔Create User and Assign Role
            • 📔Configure Keycloak Authentication Provider for User Log in
        • i18n
      • ODK Importer
        • 📔User Guides
          • 📔Configure and Import ODK Form
          • 📔Import Specific ODK Forms using ODK Instance ID
          • 📔Import Social Registry Data into PBMS
      • MTS Connector
        • 📔User Guides
          • 📔Create MTS Connector
            • 📔Create ODK MTS Connector
            • 📔Create OpenG2P Registry MTS Connector
      • Audit Logs
      • Service Provider Portal
        • 📔User Guides
          • 📔Submit Reimbursement Using the Service Provider Portal
          • 📔Reimburse the Service Provider
      • Interoperability
      • Privacy and Security
      • Periodic Biometric Authentication for Beneficiaries
      • Beneficiary Exit Process
      • Verifiable Credential Issuance
        • 📔User Guides
          • 📔Configure Inji to download Beneficiary VCs
      • Deduplication
      • Manual In-Kind Entitlement
      • Print Disbursement Summary
      • Monitoring & Reporting
        • Logging
      • Priority List
      • Offline Capabilities
      • Grievance Redress Mechanism
    • Versions
    • Developer Zone
      • Odoo Modules
        • G2P Enumerator
        • OpenG2P Registry MTS Connector
        • G2P Documents Store
        • MTS Connector
        • G2P Formio
        • G2P Registry: Rest API Extension Demo
        • G2P Registry: Additional Info REST API
        • G2P Registry: Bank Details Rest API
        • G2P Registry: Additional Info
        • G2P Registry: Membership
        • G2P Registry: Groups
        • G2P Registry: Individual
        • G2P Registry: Base
        • G2P Registry: Rest API
        • G2P Registry: Bank Details
        • G2P Registry: Security
        • G2P Service Provider Beneficiary Management
        • OpenG2P Program Payment (Payment Hub EE)
        • OpenG2P Program Payments: In Files
        • G2P Program : Program Registrant Info Rest API
        • OpenG2P Entitlement: Differential
        • OpenG2P Program: Approval
        • OpenG2P Program: Assessment
        • G2P Program: Registrant Info
        • OpenG2P Program Payment: Simple Mpesa Payment Manager
        • OpenG2P Programs: Cycleless
        • OpenG2P Entitlement: In-Kind
        • G2P Notifications: Wiserv SMS Service Provider
        • G2P: Proxy Means Test
        • G2P Programs: REST API
        • G2P Program Payment (Payment Interoperability Layer)
        • OpenG2P Entitlement: Voucher
        • OpenG2P Programs: Reimbursement
        • OpenG2P Program Payment: Cash
        • OpenG2P Program: Documents
        • OpenG2P Program Payment: G2P Connect Payment Manager
        • OpenG2P Programs: Autoenrol
        • G2P ODK Importer
        • OpenID Connect Authentication
        • G2P Auth: OIDC - Reg ID
        • G2P OpenID VCI: Base
        • G2P OpenID VCI: Programs
        • G2P OpenID VCI: Rest API
        • G2P Program Datashare: RabbitMQ
      • Developer Install on Linux
      • Repositories
        • openg2p-fastapi-common
          • OpenG2P FastAPI Common
          • OpenG2P FastAPI Auth
          • OpenG2P Common: G2P Connect ID Mapper
        • social-payments-account-registry
        • g2p-bridge
        • openg2p-packaging
        • openg2p-security
        • spar-load-test
        • 4sure
        • G2P SelfServicePortal
      • Technology Stack
    • Deployment
      • i18n
      • Installation of Odoo Modules
      • Domain names and Certificates
      • Helm Charts
  • SPAR
    • Features
      • SPAR Mapper
      • SPAR Self Service
      • Privacy & Security
      • Interoperability
      • Performance & Scale
      • Monitoring & Reporting
    • Deployment
      • Domain Names and Certificates
      • Helm Charts
    • 📔User Guides
      • 📔Link FA (Self Service)
      • 📔Link FA (Admin)
    • Development
      • Jira Board
      • Testing
        • Unit Testing
        • Functional Testing
        • Performance Testing
          • Mapper
            • Resolve API
            • Link API
            • Unlink API
            • Update API
      • Developer Install
        • SPAR Mapper API
        • SPAR Self Service API
        • SPAR Self Service UI
      • Repositories
      • API Reference
      • Tech Guides
    • Releases
      • 1.0.0
      • 1.1.0
  • G2P Bridge
    • Features
      • Extensibility - Connect to Sponsor Banks
      • Account Mapper Resolution
      • Reconciliation with Sponsor Bank
      • Scaling for High Volumes
      • Interoperability
      • Privacy & Security
      • Monitoring & Reporting
    • Deployment
      • Deployment of G2P Bridge
      • Deployment of Example Bank
      • Bank Connector Interface Guide
      • PBMS Configuration
    • Developer Zone
      • Design
        • IN APIs from PBMS
          • create_disbursement_envelope
          • cancel_disbursement_envelope
          • create_disbursements
          • cancel_disbursements
          • get_disbursement_envelope_status
          • get_disbursement_status
        • OUT APIs to Mapper
          • resolve
        • OUT APIs to Bank
          • check_funds_with_bank
          • block_funds_with_bank
          • disburse_funds_from_bank
        • IN APIs from Bank
          • upload_mt940
        • Helper Tables
          • benefit_program_configuration
        • Configuration parameters
        • Bank Connectors
        • Physical Organization
        • Example Bank
          • example-bank-models
          • example-bank-api
          • example-bank-celery
      • Testing
        • Unit Testing
        • Functional Testing
        • Performance Testing
      • Repositories
      • Developer Install
        • G2P Bridge
        • Example Bank
      • API Reference
    • Tech Guides
    • User Guides
    • Releases
      • 1.0.2
  • PBMS-Gen2
    • Developer Zone
      • Design
        • Concept
        • PBMS (Odoo)
  • Utilities and Tools
    • ODK
      • 📔User Guides
        • 📔Create a Project for a Program
        • 📔Create a Form
        • 📔Upload a Form
        • 📔Upload revised Form
        • 📔Test a Form
        • 📔Publish a Form
        • 📔Provide Form Access to Field Agent
        • 📔Download a Form on ODK Collect
        • 📔Delete a Form
        • 📔Register Offline
    • 4Sure Verifier App
      • Installation Guide for 4Sure Application
      • 📔User Guides
        • 📔Verify Digital Credentials using 4Sure Application
        • 📔Verify and Populate the form in ODK Collect using 4Sure Application
      • 4Sure Test Summary
    • Smartscanner
      • 📔User Guides
    • Registration Tool Kit
    • Unified Conversation Agent (UCA)
      • Modal Context Protocol(MCP)
      • Model Context Protocol (MCP) Implementation in UCA
      • Social Benefits Assistant with FastMCP
  • Testing
    • Test Workflow
    • Automation Framework
  • Monitoring and Reporting
    • Apache Superset
    • Reporting Framework
      • 📔User Guides
        • 📔Connector Creation Guide
        • 📔Dashboards Creation Guide
        • 📔Installation & Troubleshooting
      • Kafka Connect Transform Reference
    • System Logging
    • System Health
  • Privacy and Security
    • Key Manager
  • Data Share
    • OpenG2P - IUDX
  • Interoperability
  • Deployment
    • Base Infrastructure
      • Wireguard Bastion
        • Install WireGuard Client on Android Device
        • Wireguard Access to Users
        • Install WireGuard Client on Desktop
      • NFS Server
      • Rancher Cluster
      • OpenG2P Cluster
        • Kubernetes
          • Firewall
          • Istio
          • Adding Nodes to Cluster
          • Deleting Nodes from Cluster
        • Prometheus & Grafana
        • Fluentd & OpenSearch
          • DEPRECATED - OpenSearch
        • Landing Page For OpenG2P
      • Load Balancer
        • Nginx
        • AWS
    • Resource Requirements
    • Helm Charts
    • Upgrades
    • Production
    • OpenG2P In a Box
    • Packaging
    • Versioning
    • Additional Guides
      • Automatic Build and Upload of Private Dockers
      • Generate SSL Certificates using Letsencrypt
      • Packaging Odoo based Docker
      • AWS
        • Create ACM Certificate on AWS
        • Create Security Group on AWS
        • Domain mapping on AWS Route53
        • Make Environment Publicly Accessible using AWS LB Configuration
      • Private Access Channel
      • Odoo Post Install Configuration
      • Pulling Docker from Private Repository on Docker Hub
      • Keycloak Client Creation
      • Troubleshooting: "fsnotify watcher" warning
      • Uninstalling Applications from Rancher UI
      • Access a Database from Outside the Cluster
      • Configure External Database to Connect OpenG2P Environment
      • Configure IPSec VPN Gateway to Connect to External Systems using Strongswan
      • Troubleshooting
        • PostgreSQL Database not Starting due to Replication Checkpoint Error
        • No Space Left on the Device Warning
      • Restart Deployment or StatefulSets to Redistribute Pods across Nodes
      • Rerun Jobs in Kubernetes Cluster
      • Finding URLs in the System
      • Transitioning PostgreSQL From Docker on K8s to Standalone PostgreSQL
      • Restore a PVC from an NFS Folder and Attach it to a Pod
      • View System Logs on the OpenSearch Dashboard
      • Set up Slack alerts for a Kubernetes cluster
      • Importing Dashboards on the Superset UI for OpenG2P Applications
      • Scaling Down an Environment to Optimize Resource Usage
      • Kubernetes Master Nodes
      • Enabling Keycloak User Self-Registration
      • Automating Cache Cleanup on K8s Cluster Nodes with Cron Job
      • Set Up Slack Alerts for a Standalone Node using Netdata
    • Persistent Storage
      • Resizing Persistent Volume Claim in Kubernetes Cluster
  • 📒Guides
    • 📔User Guides
      • PBMS
        • 📔Create Program
        • 📔Configure Payment Manager in Program
        • 📔Create Eligibility Manager under Program
        • 📔Create Program Manager for a Program
        • 📔Create Manager Type
          • 📔Create Payment Manager Types
            • 📔Create Payment Hub EE Payment Manager
            • 📔Create Payment Interoperability Layer Payment Manager
            • 📔Create Default Payment Manager
            • 📔Create Cash Payment Manager
            • 📔Create File Payment Manager
        • 📔Configure Entitlement Manager under Program
        • 📔Archive, Delete, End, and Re-activate a Program
        • 📔Configure Default Program Manager
        • 📔Create Deduplication Manager under Program
    • Documentation Guides
      • Documentation Guidelines
        • Embed a Miro diagram
      • OpenG2P Module Doc Template
  • Use Cases
    • Farmer Registry
      • Reference Design: Farmer Registry
  • Releases
    • 1.1.0
      • Release Notes
  • License
    • OpenG2P Support Policy
  • Community
    • Contributing
    • Code of Conduct
  • Blogs
    • OpenG2P and SDG Goals
Powered by GitBook
LogoLogo

Copyright © 2024 OpenG2P. This work is licensed under Creative Commons Attribution International LicenseCC-BY-4.0 unless otherwise noted.

On this page
  • 1.Speech-to-Text Implementation Using Vosk
  • Overview
  • Model Selection
  • Implementation Details
  • Usage
  • Performance Considerations
  • Future Improvements
  • Technical Notes
  • 2.Text to Speech using different models
  • Text-to-Speech (TTS) Implementation
  • Performance Considerations
  • 3.Integrated Speech System Documentation
  • System Overview
  • Core Components
  • Error Handling
  • Thread Management
  • Response Processing
  • Usage Flow
  • Technical Requirements
  • Performance Considerations
  • Future Improvements
  • Troubleshooting
  • 4.Data Preparation and Embedding Creation
  • Step 1: Data Preparation and Embedding Creation
  • Technical Considerations
  • Error Handling
  • 5.Integrated AI Agent System
  • Architecture Overview
  • Core Components
  • ReAct Agent Architecture
  • Query Processing Flow
  • Understanding SQLDatabaseToolkit
  • Common Challenges and Solutions
  • Date-Feb 21st 2025
  • Improving AI Agent Accuracy and Reliability
  • Initial Implementation and Challenges
  • Evolution of Solutions
  • Speech System API Integration
  • FastAPI Service Implementation:
  • TTS Challenges: Pyttsx3
  • Ollama Installation and CUDA Permission Issues
  • Transitioning from Llama3.2 to DeepSeek:
  • DeepSeek Model Compatibility Issues
  • Enhanced Agent Architecture and Tool Control
  • Original Implementation Analysis
  • Enhanced Implementation Analysis

Was this helpful?

  1. Utilities and Tools

Unified Conversation Agent (UCA)

WORK IN PROGRESS

This is an exploration project to build an AI-based Unified Conversation Agent (UCA) to make the lives of end users better and deliver useful services. UCA will leverage AI technologies to support OpenG2P use cases for social benefit delivery across programs and departments. This intelligent agent will engage directly with callers via voice, providing real-time updates on program statuses and disbursements, informing them about eligibility for additional programs, and enabling seamless program application entirely through phone or voice interactions.

1.Speech-to-Text Implementation Using Vosk

Overview

This documentation covers the implementation of a real-time speech-to-text system using the Vosk speech recognition toolkit. The system captures audio input from the microphone and converts it to text in real-time.

Model Selection

After evaluating different speech recognition models, we selected Vosk for its offline capabilities and ease of implementation. Two models were tested:

  • vosk-model-small-en-us-0.15 (smaller model)

  • vosk-model-en-us-0.22 (larger model)

Based on empirical testing, the larger model (en-us-0.22) demonstrated better accuracy in speech recognition compared to the smaller model. While no formal metrics were used for evaluation, hands-on experience showed more reliable transcription results with the larger model.

Implementation Details

Dependencies

  • vosk: Speech recognition engine

  • sounddevice: Audio input handling

  • json: Processing recognition results

  • queue: Managing audio data stream

Key Components

  1. Model Initialization

model = vosk.Model("models/vosk-model-small-en-us-0.15")
samplerate = 16000

The system initializes with a Vosk model and sets the audio sampling rate to 16kHz, which is the standard for speech recognition.

  1. Audio Capture The implementation uses a queue-based system to handle audio input:

def callback(indata, frames, time, status):
    if status:
        print(status)
    q.put(bytes(indata))

This callback function captures audio data in real-time and places it in a queue for processing.

  1. Recognition Loop The main recognition loop:

  • Continuously processes audio data from the queue

  • Converts speech to text in real-time

  • Outputs recognized text when confidence is sufficient

Usage

  1. Ensure the appropriate Vosk model is downloaded and placed in the models directory

  2. Run the script

  3. Speak into the microphone

  4. Press Ctrl+C to stop the recognition

Performance Considerations

  • The larger model (en-us-0.22) requires more computational resources but provides better accuracy

  • The system processes audio in real-time with minimal latency

  • Queue-based implementation ensures smooth audio capture without data loss

Future Improvements

  • Implement formal accuracy metrics for model comparison

  • Add support for multiple languages

  • Optimize memory usage for long-running sessions

Technical Notes

  • Audio is captured at 16kHz with 16-bit depth

  • Processing occurs in blocks of 8000 samples

  • Single channel (mono) audio input is used for optimal recognition

2.Text to Speech using different models

Text-to-Speech (TTS) Implementation

Model Evaluation and Selection

We evaluated three different TTS solutions:

  1. Coqui TTS (Jenny Model)

    • GitHub: https://github.com/coqui-ai/TTS

    • Implementation used tts_models/en/jenny/jenny

    • Voice quality was not satisfactory - produced unexpected voice modulation

    • Resource-intensive and required significant setup

  2. Coqui Tacotron2-DDC

    • Using tts_models/en/ljspeech/tacotron2-DDC

    • Produced good voice quality

    • Drawbacks:

      • Long loading times

      • Lower accuracy compared to alternatives

      • Resource-intensive

  3. pyttsx3

    • GitHub: https://github.com/nateshmbhat/pyttsx3

    • Selected as final implementation

    • Advantages:

      • Fast response time

      • Simple implementation

      • Reliable performance

      • Minimal resource usage

      • Good voice quality

    • Implementation uses default speech rate of 150

Final Implementation Details

The system uses pyttsx3 with the following key components:

  1. Engine Initialization

def initialize_engine():
    engine = pyttsx3.init()
    engine.setProperty('rate', 150)
    return engine
  1. Main TTS Loop

  • Continuous text input processing

  • Clean exit functionality

  • Simple user interface

Usage

  1. Initialize the TTS engine

  2. Enter text when prompted

  3. System converts text to speech in real-time

  4. Type 'quit' to exit

  5. Supports keyboard interrupt (Ctrl+C)

Alternative Implementations (For Reference)

Coqui TTS Implementation

from TTS.api import TTS
from IPython.display import Audio, display

def stream_tts(text, model_name="tts_models/en/jenny/jenny"):
    tts = TTS(model_name=model_name)
    wav = tts.tts(text)
    return Audio(wav, rate=22050, autoplay=True)

Tacotron Implementation

def stream_tts(text, model_name="tts_models/en/ljspeech/tacotron2-DDC"):
    tts = TTS(model_name=model_name)
    wav = tts.tts(text)
    sd.play(wav, samplerate=22050)
    sd.wait()

Performance Considerations

  • pyttsx3 provides immediate response with minimal latency

  • No internet connection required

  • Lower resource usage compared to neural network-based solutions

  • Suitable for continuous operation

3.Integrated Speech System Documentation

System Overview

The system integrates speech-to-text (STT) and text-to-speech (TTS) capabilities with an API service, creating a complete voice interaction system. Key features include loopback prevention and thread-based conversation management.

Core Components

1. Audio Processing

  • Uses Vosk for speech recognition (model: vosk-model-en-us-0.22)

  • Implements pyttsx3 for text-to-speech

  • Manages audio through sounddevice with 16kHz sampling rate

2. API Integration

def send_to_uca(text: str, thread_id: str) -> Optional[str]:
    """Send text to UCA API and receive response"""
    payload = {
        'query': text,
        'thread_id': thread_id
    }
    response = requests.post(
        'http://xxxxxx/chat',
        json=payload,
        timeout=10
    )
  • Implements REST API communication

  • Supports conversation threading

  • Includes timeout handling (10 seconds)

  • Response cleaning functionality

3. Loopback Prevention System

The system implements multiple mechanisms to prevent audio loopback:

  1. Global Processing Flag

processing_output = False
  • Tracks when system is outputting speech

  • Prevents audio capture during TTS playback

  1. Audio Callback Control

def audio_callback(indata, frames, time, status):
    if not processing_output:
        q.put(bytes(indata))
  • Only processes input when not outputting speech

  • Uses global flag to control audio capture

  1. Silence Detection

last_speech_time = time.time()
silence_threshold = 2.0
if current_time - last_speech_time >= silence_threshold:
    # Process speech
  • Implements 2-second silence threshold

  • Prevents rapid-fire speech processing

  1. Queue Management

while not q.empty():
    q.get()
  • Clears audio queue before processing new input

  • Prevents backlog of audio data

Error Handling

  1. API Communication

  • Timeout handling for API requests

  • Response validation

  • Error message feedback through TTS

  1. Audio Processing

  • Exception handling in main loop

  • Graceful shutdown on interruption

  • Recovery from processing errors

Thread Management

  • Unique thread IDs for conversation tracking

  • Format: 'user01_XX' where XX is the session number

  • Maintains conversation context across interactions

Response Processing

Clean Response Function

def clean_response(response: str) -> str:
    """Clean the API response to get only the actual message content"""
    if '================================== Ai Message ==================================' in response:
        message = response.split('================================== Ai Message ==================================')[-1]
        message = message.replace('=', '')
        return message.strip()
    return response
  • Removes formatting characters

  • Extracts relevant message content

  • Maintains original response if no cleaning needed

Usage Flow

  1. System initialization

    • Load speech recognition model

    • Initialize TTS engine

    • Configure audio settings

  2. Continuous operation loop

    • Listen for speech input

    • Convert speech to text

    • Send to API

    • Process response

    • Convert response to speech

    • Reset for next interaction

Technical Requirements

  • Python 3.x

  • vosk

  • pyttsx3

  • sounddevice

  • requests

Performance Considerations

  • Audio processing runs at 16kHz with 16-bit depth

  • 8000 sample blocksize for audio processing

  • 2-second silence threshold for speech segmentation

  • 150 WPM speech rate for TTS

Future Improvements

  1. Dynamic silence threshold adjustment

  2. Multiple language support

  3. Enhanced error recovery

  4. Voice activity detection

  5. Configurable audio parameters

Troubleshooting

  1. Audio Loopback Issues

    • Verify speakers aren't feeding into microphone

    • Check processing_output flag status

    • Confirm silence threshold appropriateness

  2. API Communication

    • Check network connectivity

    • Verify thread_id format

    • Monitor API response times

    • Validate API endpoint status

4.Data Preparation and Embedding Creation

Step 1: Data Preparation and Embedding Creation

Overview

The first step involves extracting data from SQL database and creating embeddings using FAISS. This process creates a searchable vector store for efficient similarity searches.

Components Used

  • LangChain HuggingFace Embeddings

  • FAISS Vector Store

  • SQLite Database

  • all-MiniLM-L6-v2 embedding model

Implementation Details

1. Database Connection and Data Retrieval

def fetch_programs_from_db():
    conn = sqlite3.connect('pdb')
    cursor = conn.cursor()
    cursor.execute('SELECT pid, mneumonic, description FROM pinfo')
    programs = cursor.fetchall()
    conn.close()
    return programs
  • Connects to SQLite database

  • Retrieves specific fields: pid, mneumonic, description

  • Returns data as tuples

2. Document Creation

def create_program_documents(programs):
    documents = []
    for pid, mneumonic, description in programs:
        content = f"{mneumonic}: {description}" if description else mneumonic
        doc = Document(
            page_content=content,
            metadata={
                "pid": pid,
                "mneumonic": mneumonic
            }
        )
        documents.append(doc)
    return documents

Key features:

  • Combines mneumonic and description for context

  • Preserves metadata (pid and mneumonic)

  • Creates LangChain Document objects

  • Handles cases where description might be missing

3. Embedding Creation and Storage

def create_and_save_embeddings():
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    programs = fetch_programs_from_db()
    documents = create_program_documents(programs)
    vector_store = FAISS.from_documents(documents, embeddings)
    vector_store.save_local("new_faiss/programs_index")

Important aspects:

  • Uses all-MiniLM-L6-v2 for embedding generation

  • Creates FAISS vector store

  • Saves index locally for future use

Data Flow

  1. SQL Data → Python Objects

  2. Python Objects → LangChain Documents

  3. Documents → Vector Embeddings

  4. Embeddings → FAISS Index

Technical Considerations

1. Data Structure

  • Content structure: "{mneumonic}: {description}"

  • Metadata structure:

    {
        "pid": unique_identifier,
        "mneumonic": mneumonic_text
    }

Error Handling

Database Errors

try:
    conn = sqlite3.connect('pdb')
    # ... database operations
except sqlite3.Error as e:
    print(f"Database error: {e}")
finally:
    conn.close()

Embedding Creation Errors

try:
    vector_store = FAISS.from_documents(documents, embeddings)
except Exception as e:
    print(f"Embedding creation error: {e}")

5.Integrated AI Agent System

Architecture Overview

The CombinedProgramAgent creates a AI system that integrates vector search (FAISS), structured database queries (SQL), and language model reasoning through the ReAct architecture.

Core Components

1. Agent Initialization

def __init__(
    self,
    db_path: str,
    faiss_index_path: str,
    llm_model: str = "llama3.2",
    embeddings_model: str = "all-MiniLM-L6-v2",
    num_threads: int = 4
):

This initialization sets up three primary components:

  • Language Model (LLM) configuration

  • Tool initialization (FAISS and SQL)

  • ReAct agent setup with system prompt

2. LLM Configuration

def _init_llm(self, model: str, num_threads: int):
    return ChatOllama(
        model=model,
        temperature=0,
        num_thread=num_threads
    )

The LLM configuration:

  • Uses Ollama for local model deployment

  • Sets temperature to 0 for consistent, deterministic responses

  • Enables multi-threading for improved performance

3. Tool Integration

SQL Database Toolkit

db = SQLDatabase.from_uri(f'sqlite:///{db_path}')
sql_toolkit = SQLDatabaseToolkit(db=db, llm=self.llm)
sql_tools = sql_toolkit.get_tools()

The SQLDatabaseToolkit provides:

  • Query generation from natural language

  • Direct SQL execution

  • Result summarization

  • Schema inspection capabilities

FAISS Vector Search

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model)
vector_store = FAISS.load_local(faiss_index_path, embeddings)
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

The FAISS integration enables:

  • Semantic similarity search

  • Efficient retrieval of relevant program information

  • Configurable number of similar results (k=3)

ReAct Agent Architecture

Understanding ReAct

ReAct (Reasoning and Action) is an agent architecture that combines:

  1. Reasoning: Thinking about what to do next

  2. Action: Executing tools based on reasoning

  3. Observation: Processing tool outputs

  4. Reflection: Using results to plan next steps

System Prompt Design

system_prompt = """You are a program eligibility advisor that helps users find suitable social benefit programs. Follow these steps for each query:
1. Identify the intent...
2. First, use the program_info tool...
3. For each potentially relevant program...
4. Combine the information...
"""

The system prompt structures the agent's behavior by:

  • Defining clear steps for processing queries

  • Establishing tool usage priorities

  • Setting response formatting guidelines

  • Implementing error checking protocols

Memory Management

memory = MemorySaver()
return create_react_agent(
    self.llm,
    self.tools,
    checkpointer=memory,
    state_modifier=SystemMessage(content=system_prompt)
)

The MemorySaver enables:

  • Conversation state tracking

  • Thread-based memory management

  • Consistent context maintenance

Query Processing Flow

  1. Query Reception

    def get_response(self, query: str, thread_id: str) -> str:
    • Receives user query and thread ID

    • Prepares configuration for processing

  2. Tool Selection

    • Agent decides between FAISS and SQL tools

    • FAISS for semantic search

    • SQL for specific criteria verification

  3. Response Generation

    • Combines tool outputs

    • Formats according to system prompt

    • Returns structured response

Understanding SQLDatabaseToolkit

The SQLDatabaseToolkit provides several tools:

  1. Query Generator

    • Converts natural language to SQL

    • Handles complex query construction

    • Manages table relationships

  2. SQL Executor

    • Runs generated queries

    • Handles error cases

    • Returns formatted results

  3. Schema Inspector

    • Analyzes database structure

    • Provides table information

    • Helps in query construction

Common Challenges and Solutions

1. Library Dependency Conflicts

Solution approaches:

  • Use virtual environments

  • Pin specific package versions(requirements.txt)

  • Document working configurations

Date-Feb 21st 2025

Improving AI Agent Accuracy and Reliability

Initial Implementation and Challenges

Original Approach

The initial implementation used a combined agent system with:

  • FAISS vector store for semantic search

  • SQL database for detailed program information

  • Basic system prompt for agent guidance

Prompt Used:

system_prompt = """You are a program eligibility advisor that helps users find suitable social benefit programs. Follow these steps for each query:

1. Identify the intent of the user,if it is greeting then respond naturally to greetings and casual conversation. If its related to Programs/eligibility/schemes then follow the next instructions.
2. First, use the program_info tool to find relevant programs based on the user's situation
3. For each potentially relevant program found, use the SQL tools to check detailed eligibility criteria
4. Combine the information from both sources to provide a complete response that includes:
   - Program name and brief description
   - Key eligibility criteria
   - Whether the user likely qualifies based on their stated situation
   
Keep responses concise but informative. If more information is needed from the user to determine eligibility, ask for specific details.

Remember: 
- Verify eligibility criteria in the database before making definitive statements
- Consider all relevant programs that might apply to the user's situation
- Be clear about what information you're basing your response on"""

Key Challenges Encountered

  1. Data Quality Issues

    • Limited program descriptions in FAISS

    • Abstract information leading to ambiguous matches

    • Insufficient context for accurate recommendations

  2. LLM Hallucination

    • Agent making assumptions beyond available data

    • Mixing up eligibility criteria

    • Providing inaccurate program recommendations

  3. Response Accuracy

    • Inconsistent response structure

    • Unclear distinction between found and inferred information

    • Missing verification steps

Evolution of Solutions

Attempt 1: Enhanced Prompt Engineering

Detailed Structured Prompt

system_prompt = """You are a program eligibility advisor that MUST follow these exact steps and ONLY use information from our tools. Never make up or hallucinate information from external sources.
EXACT SEARCH SEQUENCE:
1. FAISS Search (MANDATORY FIRST STEP):
   - Use the program_info tool to search for relevant programs
   - You will receive results in this format for each program:
     * content: "MNEUMONIC: Description of the program"
     * metadata: {"pid": number, "mneumonic": "code"}
   - You must explicitly state all programs found, showing both content and metadata
2. SQL Database Check (MANDATORY SECOND STEP):
   - For each program found in FAISS results, use the SQL tools to query the database
   - Use this exact query structure:
     SELECT * FROM pinfo WHERE pid = [pid_from_faiss] AND mneumonic = '[mneumonic_from_faiss]'
   - You must show the complete database results for each program
3. Response Formation (FINAL STEP):
   - Use ONLY the information retrieved from steps 1 and 2
   - Never add information from external sources or your general knowledge
   - Structure your response like this:
     a. Programs Found (from FAISS):
        - List each program with its exact description and metadata
     b. Eligibility Details (from SQL which is there in the form of sql query):
        - Show the exact eligibility criteria from database
     c. Analysis:
        - Compare user's stated situation against the exact criteria found
        - Only make conclusions based on the data we have
IMPORTANT RULES:
- Never make up program names or criteria
- Never suggest programs that weren't found in our search
- If no programs match in FAISS search, say so clearly
- If you need more information to determine eligibility, ask specific questions based on the criteria you found
- For greetings or non-program queries, respond naturally without using tools
Remember: You are working with a specific database of programs. Only provide information that was explicitly returned by our search tools. If you're unsure about any criteria, show the exact data you found and ask for clarification."""

Improvements Attempted:

  • Strict step-by-step instructions

  • Explicit search sequence

  • Mandatory tool usage order

  • Structured response format

Results:

  • Some improvement in response structure

  • Still faced Hallucination issues

  • Didn't fully solve accuracy problems

Attempt 2: Data-Centric Approach

1. Data Quality Enhancement

  • Replaced abstract descriptions with detailed program information

  • Improved FAISS embeddings quality

  • Better context preservation

2. Simplified Yet Strict Prompt

system prompt="""You are a program eligibility advisor that helps users find suitable social benefit programs. Follow these steps for each query:
Must follow:give the response only with respect to the content you retrieve from program_info tool and SQL tool,if the content is not there then say "I dont have any idea on that" , Do not hallucinate or give information other than the retrieved info[Highly mandatory]
1. Identify the intent of the user,if it is greeting then respond naturally to greetings and casual conversation. If its related to Programs/eligibility/schemes then follow the next instructions.
2. First, use the program_info tool to find relevant programs based on the user's situation,it will return ID and Mnuemonic, use the ID to retrive the complete details from SQL which in mentioned in next step.
3. For each potentially relevant program found, use the SQL tools to check detailed eligibility criteria
4. Combine the information from both sources to provide a complete response that includes:
   - Program name and brief description
   - Key eligibility criteria
   - Whether the user likely qualifies based on their stated situation
Keep responses concise but informative. If more information is needed from the user to determine eligibility, ask for specific details.
Remember:
- Verify eligibility criteria in the database before making definitive statements
- Consider all relevant programs that might apply to the user's situation
- Be clear about what information you're basing your response on"""

Key Features:

  • Clear hallucination prohibition

  • Explicit tool usage instructions

  • Strong emphasis on retrieved data only

3. Improved Data Flow

  1. FAISS returns program ID and Mneumonic

  2. SQL lookup using returned IDs

  3. Comprehensive information retrieval

Speech System API Integration

FastAPI Service Implementation:

The system implements a FastAPI-based service that integrates the CombinedProgramAgent with speech capabilities, enabling HTTP-based communication for the speech interface.

Components

1. API Configuration

from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()

2. Agent Initialization

agent = CombinedProgramAgent(
    db_path='pdb',
    faiss_index_path='/home/veerendra/faiss/programs_index',
    llm_model='llama3.2',
    embeddings_model='all-MiniLM-L6-v2'
)

3. Request Model

class UserInput(BaseModel):
    query: str
    thread_id: str

API Endpoints

  1. Health Check

@app.get("/")
def respond():
    return 'All well'
  1. Chat Endpoint

@app.post("/chat")
def ai_respond(user_input: UserInput):
    query = user_input.query
    thread_id = user_input.thread_id
    response = agent.get_response(query, thread_id)
    return {'ai_message': response}

Server Configuration

uvicorn.run(app, host="0.0.0.0", port=8000)
  • Listens on all network interfaces

  • Uses port 8000

  • Enables remote access

TTS Challenges: Pyttsx3

Platform-Specific Speech Engines

  1. Windows Environment

    • Uses SAPI5 (Microsoft Speech API)

    • Advantages:

      • High-quality voice synthesis

      • Natural-sounding output

      • Multiple voice options

      • Good control over speech parameters

    • Implementation:

      engine = pyttsx3.init('sapi5')
  2. Linux Environment

    • Uses eSpeak by default

    • Limitations:

      • Robotic voice quality

      • Limited voice options

      • Less natural pronunciation

      • Reduced control over voice parameters

Ollama Installation and CUDA Permission Issues

Error Overview

When I ran the combined_agent.py, the following error was encountered: attempting to use Ollama with CUDA acceleration,

ollama._types.ResponseError: llama runner process has terminated: error:status: Permission denied [/usr/local/lib/ollama/cuda_v11/libggml-blas.so]

This error indicates a permission issue with the CUDA libraries that Ollama needs to access.

Root Causes

  1. Permission Problems: The Ollama service user doesn't have proper permissions to access CUDA libraries

  2. Ownership Issues: CUDA library files have incorrect ownership

  3. Installation Conflicts: Mismatched CUDA versions between system drivers and Ollama requirements

Resolution Steps

The issue was resolved through a complete reinstallation of Ollama and proper permission configuration:

  1. Fix Immediate Permissions

    sudo chown -R ollama:ollama /usr/local/lib/ollama/
  2. Perform Clean Reinstallation

    # Remove existing installation
    sudo apt purge ollama
    
    # Download latest version
    curl -fsSL https://ollama.com/install.sh | sh
  3. Verify CUDA Compatibility

    # Check CUDA version supported by current drivers
    nvidia-smi
  4. Update NVIDIA Drivers (if needed)

    sudo apt install nvidia-driver-535 nvidia-cuda-toolkit
    sudo reboot
  5. Restart and Verify Service

    sudo systemctl restart ollama
    sudo journalctl -u ollama.service -b  # Check for service er structure

Transitioning from Llama3.2 to DeepSeek:

Limitations of Llama3.2

When implementing the combined agent system with Llama3.2, we encountered several significant performance issues:

  1. Inconsistent Tool Utilization

    • The model frequently failed to call the appropriate tools

    • Sometimes ignored the FAISS vector search tool (program_info)

    • Other times skipped the SQL database tools

    • Resulted in incomplete information gathering

  2. Poor Intent Recognition

    • Failed to properly identify user intents

    • Confused casual conversation with program inquiries

    • Responded inappropriately to queries

  3. Prompt Adherence Issues

    • Did not consistently follow the structured approach defined in prompts

    • Skipped critical verification steps

    • Provided responses without gathering necessary information

  4. Reasoning Limitations

    • Struggled with complex multi-step reasoning

    • Failed to integrate information from multiple sources

    • Made conclusions without proper verification

Motivation for DeepSeek Implementation

Due to these limitations, we explored the DeepSeek model (deepseek-r1:8b) for the following reasons:

  1. Advanced Capabilities

    • Larger parameter count (8B vs Llama3.2)

    • Better reported performance on reasoning tasks

    • Improved instruction-following capabilities

    • Enhanced context understanding

  2. Quality Improvements

    • More consistent reasoning patterns

    • Better adherence to structured prompts

    • Improved multi-step planning

    • Higher accuracy in understanding complex queries

  3. Integration Potential

    • Compatible with Ollama deployment

    • Designed for assistant-like applications

    • Support for complex reasoning chains

DeepSeek Model Compatibility Issues

Error Overview

When attempting to use the DeepSeek model with tools in the CombinedProgramAgent, the following error occurred:

ollama._types.ResponseError: registry.ollama.ai/library/deepseek-r1:8b does not support tools

This error indicates that the DeepSeek model, as implemented in Ollama, doesn't support the function calling/tools API that LangGraph and LangChain require for agent implementation.

Technical Background

  1. Tool-Using Capability: Modern LLMs require specific capabilities to utilize tools/function calling:

    • Standardized input/output formats

    • Support for specific JSON schema interpretation

    • Built-in capability to generate structured tool-use requests

  2. DeepSeek Limitations: The current DeepSeek implementation in Ollama:

    • Lacks the necessary function-calling API

    • Cannot parse or generate the required JSON structure

    • Is not fine-tuned for tool-using applications

Enhanced Agent Architecture and Tool Control

This documentation analyzes the evolution of the CombinedProgramAgent system, focusing on architectural improvements that resolved critical limitations in the original implementation. The agent serves as a program eligibility advisor that utilizes vector search (FAISS) and structured database queries (SQL) to provide accurate program recommendations based on user inquiries.

Original Implementation Analysis

Architecture Overview

The original implementation featured:

  1. A standard LangChain ReAct agent architecture

  2. Direct integration of SQL and FAISS tools

  3. A basic system prompt guiding agent behavior

Critical Limitations

1. Tool Sequencing Problems

The original implementation allowed the agent to use tools in any order, resulting in:

# Original tool initialization - no sequencing enforcement
sql_toolkit = SQLDatabaseToolkit(db=db, llm=self.llm)
sql_tools = sql_toolkit.get_tools()

# FAISS tool created but not prioritized
faiss_tool = create_retriever_tool(
    retriever,
    "program_info",
    "Search for program information and descriptions"
)

# Tools combined without hierarchy
return sql_tools + [faiss_tool]

This approach gave equal priority to all tools, allowing the agent to:

  • Execute SQL queries without first identifying relevant programs through FAISS

  • Misunderstand the dependent relationship between tools

  • Produce incomplete or erroneous information

2. Hallucination Issues

The original system permitted hallucination through:

  1. Lack of strict data validation

  2. No explicit response verification

  3. Basic prompt structure without enforced boundaries:

pythonCopysystem_prompt = """You are a program eligibility advisor...
Must follow:give the response only with respect to the content you retrieve from program_info tool and SQL tool,
if the content is not there then say "I dont have any idea on that" , Do not hallucinate...
"""

Despite these instructions, the agent would often invent program details, combine real and fabricated information, or provide erroneous eligibility assessments.

Enhanced Implementation Analysis

The updated implementation represents a significant architectural advancement with several sophisticated mechanisms:

1. Enforced Tool Sequencing

faiss_tool = create_retriever_tool(
    retriever,
    "program_search",
    "MUST USE FIRST! Converts user needs to program IDs. Returns: [ID|Program Name|Brief Description]"
)

sql_tools = [
    Tool.from_function(
        func=self._wrap_sql_tool(tool),
        name=tool.name,
        description=f"REQUIRES PROGRAM ID FROM program_search. {tool.description} Input must include 'program_id:' followed by ID."
    ) for tool in sql_toolkit.get_tools()
]

# FAISS tool intentionally positioned first
return [faiss_tool] + sql_tools

This implementation enforces a strict tool hierarchy through:

  1. Clear "MUST USE FIRST" directive in the FAISS tool description

  2. SQL tools explicitly requiring input from the FAISS tool

  3. Order-dependent tool list structure

2. SQL Tool Wrapper Mechanism

def _wrap_sql_tool(self, tool: Tool):
    def wrapped_tool(
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        if "program_id:" not in query:
            return "Error: Missing program_id. First use program_search to get valid IDs."
        return tool._run(query, run_manager)
    return wrapped_tool

This wrapper operates through:

  1. Function Closure: Creates a new function that encapsulates the original tool

  2. Input Validation: Checks for the presence of "program_id:" in the query

  3. Error Redirection: Returns an explicit error message rather than executing the tool when validation fails

  4. Transparent Execution: Passes valid requests to the original tool with all necessary context

The wrapper establishes a dependency chain that ensures:

  • FAISS search must be used first to get program IDs

  • SQL tools can only operate on previously identified programs

  • The agent receives immediate feedback when attempting to bypass the workflow

3. Response Validation System

The enhanced implementation introduces a sophisticated response validation mechanism:

pythonCopydef _get_known_programs(self) -> Dict[str, str]:
    try:
        return {row['id']: row['name'] 
                for row in self.db.run("SELECT id, name FROM programs")}
    except:
        return {}

def _validate_response(self, response: str) -> str:
    # Allow any response that doesn't mention programs (greetings/smalltalk)
    if not any(prog.lower() in response.lower() for prog in self.known_programs.values()):
        return response
        
    # Program-related responses must contain known program names
    if any(prog.lower() in response.lower() for prog in self.known_programs.values()):
        return response
        
    return "I need more details to properly check program eligibility. Please share your situation."

This validation system:

  1. Builds a repository of known program names from the database

  2. Applies different validation rules based on response content

  3. Allows conversational responses without program references to pass unchanged

  4. Verifies that program-related responses only mention known programs

  5. Provides a fallback response for potential hallucinations

4. Improved System Prompt

The updated system prompt incorporates several advanced features:

pythonCopysystem_prompt = """You are a program eligibility advisor. Follow these STRICT RULES:

1. INPUT CLASSIFICATION:
   - If the user message is a greeting or small talk (e.g., "hello", "hi", "how are you?"):
     * Respond politely but briefly
     * DO NOT USE ANY TOOLS
     * Keep response under 15 words
   - For program-related queries:
     * MUST follow the workflow below

2. PROGRAM WORKFLOW:
   a. Use program_search FIRST
   b. For each found ID: Use SQL tools with 'program_id:<ID>'
   c. Compare user details to SQL criteria

3. RESPONSE RULES:
   - Program responses MUST follow format:
     [Program Name] (ID: <id>)
     - Eligibility: <quoted criteria>
     - Match: <Yes/Partial/No> because <comparison>
   - No programs found: "No matching programs found"

PROHIBITIONS:
- Inventing programs/criteria
- Using SQL without program_search first
- Speculating beyond retrieved data"""

Key improvements include:

  1. Explicit greeting identification with examples

  2. Clear prohibition on tool usage for greetings

  3. Mandatory response format for standardization

  4. Specific prohibition clauses

PreviousRegistration Tool KitNextModal Context Protocol(MCP)

Last updated 2 months ago

Was this helpful?