chore(main): release 0.7.3 (#565 )

doc: change readme intro
doc: update readme with new images
2025-12-20 20:29:06 +00:00 · 2025-08-22 16:35:52 -06:00 · 2025-08-22 16:26:25 -06:00 · 2025-08-22 16:15:54 -06:00 · 2025-08-22 14:41:09 -06:00 · 2025-08-22 11:51:34 -06:00
6 changed files with 73 additions and 35 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -14,4 +14,6 @@ data/
 www/REFACTOR.md
 www/reload-frontend
 server/test.sqlite
-CLAUDE.local.md
+CLAUDE.local.md
+www/.env.development
+www/.env.production
--- a/.gitleaksignore
+++ b/.gitleaksignore
@@ -0,0 +1 @@
+b9d891d3424f371642cb032ecfd0e2564470a72c:server/tests/test_transcripts_recording_deletion.py:generic-api-key:15
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -27,3 +27,8 @@ repos:
        files: ^server/
      - id: ruff-format
        files: ^server/
+
+  - repo: https://github.com/gitleaks/gitleaks
+    rev: v8.28.0
+    hooks:
+      - id: gitleaks
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,13 @@
 # Changelog

+## [0.7.3](https://github.com/Monadical-SAS/reflector/compare/v0.7.2...v0.7.3) (2025-08-22)
+
+
+### Bug Fixes
+
+* cleaned repo, and get git-leaks clean ([359280d](https://github.com/Monadical-SAS/reflector/commit/359280dd340433ba4402ed69034094884c825e67))
+* restore previous behavior on live pipeline + audio downscaler ([#561](https://github.com/Monadical-SAS/reflector/issues/561)) ([9265d20](https://github.com/Monadical-SAS/reflector/commit/9265d201b590d23c628c5f19251b70f473859043))
+
 ## [0.7.2](https://github.com/Monadical-SAS/reflector/compare/v0.7.1...v0.7.2) (2025-08-21)


--- a/README.md
+++ b/README.md
@@ -1,43 +1,60 @@
 <div align="center">
+<img width="100" alt="image" src="https://github.com/user-attachments/assets/66fb367b-2c89-4516-9912-f47ac59c6a7f"/>

 # Reflector

-Reflector Audio Management and Analysis is a cutting-edge web application under development by Monadical. It utilizes AI to record meetings, providing a permanent record with transcripts, translations, and automated summaries.
+Reflector is an AI-powered audio transcription and meeting analysis platform that provides real-time transcription, speaker diarization, translation and summarization for audio content and live meetings. It works 100% with local models (whisper/parakeet, pyannote, seamless-m4t, and your local llm like phi-4).

-[![Tests](https://github.com/monadical-sas/reflector/actions/workflows/pytests.yml/badge.svg?branch=main&event=push)](https://github.com/monadical-sas/reflector/actions/workflows/pytests.yml)
+[![Tests](https://github.com/monadical-sas/reflector/actions/workflows/test_server.yml/badge.svg?branch=main&event=push)](https://github.com/monadical-sas/reflector/actions/workflows/test_server.yml)
 [![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)
 </div>
-
-## Screenshots
+</div>
 <table>
  <tr>
    <td>
-      <a href="https://github.com/user-attachments/assets/3a976930-56c1-47ef-8c76-55d3864309e3">
-        <img width="700" alt="image" src="https://github.com/user-attachments/assets/3a976930-56c1-47ef-8c76-55d3864309e3" />
+      <a href="https://github.com/user-attachments/assets/21f5597c-2930-4899-a154-f7bd61a59e97">
+        <img width="700" alt="image" src="https://github.com/user-attachments/assets/21f5597c-2930-4899-a154-f7bd61a59e97" />
      </a>
    </td>
    <td>
-      <a href="https://github.com/user-attachments/assets/bfe3bde3-08af-4426-a9a1-11ad5cd63b33">
-        <img width="700" alt="image" src="https://github.com/user-attachments/assets/bfe3bde3-08af-4426-a9a1-11ad5cd63b33" />
+      <a href="https://github.com/user-attachments/assets/f6b9399a-5e51-4bae-b807-59128d0a940c">
+        <img width="700" alt="image" src="https://github.com/user-attachments/assets/f6b9399a-5e51-4bae-b807-59128d0a940c" />
      </a>
    </td>
    <td>
-      <a href="https://github.com/user-attachments/assets/7b60c9d0-efe4-474f-a27b-ea13bd0fabdc">
-        <img width="700" alt="image" src="https://github.com/user-attachments/assets/7b60c9d0-efe4-474f-a27b-ea13bd0fabdc" />
+      <a href="https://github.com/user-attachments/assets/a42ce460-c1fd-4489-a995-270516193897">
+        <img width="700" alt="image" src="https://github.com/user-attachments/assets/a42ce460-c1fd-4489-a995-270516193897" />
+      </a>
+    </td>
+    <td>
+      <a href="https://github.com/user-attachments/assets/21929f6d-c309-42fe-9c11-f1299e50fbd4">
+        <img width="700" alt="image" src="https://github.com/user-attachments/assets/21929f6d-c309-42fe-9c11-f1299e50fbd4" />
      </a>
    </td>
  </tr>
 </table>

+## What is Reflector?
+
+Reflector is a web application that utilizes AI to process audio content, providing:
+
+- **Real-time Transcription**: Convert speech to text using [Whisper](https://github.com/openai/whisper) (multi-language) or [Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) (English) models
+- **Speaker Diarization**: Identify and label different speakers using [Pyannote](https://github.com/pyannote/pyannote-audio) 3.1
+- **Live Translation**: Translate audio content in real-time to many languages with [Facebook Seamless-M4T](https://github.com/facebookresearch/seamless_communication)
+- **Topic Detection & Summarization**: Extract key topics and generate concise summaries using LLMs
+- **Meeting Recording**: Create permanent records of meetings with searchable transcripts
+
+Currently we provide [modal.com](https://modal.com/) gpu template to deploy.
+
 ## Background

 The project architecture consists of three primary components:

- **Front-End**: NextJS React project hosted on Vercel, located in `www/`.
 - **Back-End**: Python server that offers an API and data persistence, found in `server/`.
- **GPU implementation**: Providing services such as speech-to-text transcription, topic generation, automated summaries, and translations. Most reliable option is Modal deployment
+- **Front-End**: NextJS React project hosted on Vercel, located in `www/`.
+- **GPU implementation**: Providing services such as speech-to-text transcription, topic generation, automated summaries, and translations.

-It also uses authentik for authentication if activated, and Vercel for deployment and configuration of the front-end.
+It also uses authentik for authentication if activated.

 ## Contribution Guidelines

--- a/server/reflector/processors/audio_chunker_silero.py
+++ b/server/reflector/processors/audio_chunker_silero.py
@@ -11,10 +11,7 @@ from reflector.processors.audio_chunker_auto import AudioChunkerAutoProcessor

 class AudioChunkerSileroProcessor(AudioChunkerProcessor):
    """
-    Assemble audio frames into chunks with VAD-based speech detection using Silero VAD.
-
-    Expects input audio to be already downscaled to 16kHz mono s16 format
-    (handled by AudioDownscaleProcessor in the pipeline).
+    Assemble audio frames into chunks with VAD-based speech detection using Silero VAD
    """

    def __init__(
@@ -34,13 +31,12 @@ class AudioChunkerSileroProcessor(AudioChunkerProcessor):
        self._init_vad(use_onnx)

    def _init_vad(self, use_onnx=False):
-        """Initialize Silero VAD model for 16kHz audio"""
+        """Initialize Silero VAD model"""
        try:
            torch.set_num_threads(1)
            self.vad_model = load_silero_vad(onnx=use_onnx)
-            # VAD expects 16kHz audio (guaranteed by AudioDownscaleProcessor)
            self.vad_iterator = VADIterator(self.vad_model, sampling_rate=16000)
-            self.logger.info("Silero VAD initialized for 16kHz audio")
+            self.logger.info("Silero VAD initialized successfully")

        except Exception as e:
            self.logger.error(f"Failed to initialize Silero VAD: {e}")
@@ -79,7 +75,7 @@ class AudioChunkerSileroProcessor(AudioChunkerProcessor):
            return None

        # Processing block with current buffer size
-        # print(f"Processing block: {len(self.frames)} frames in buffer")
+        print(f"Processing block: {len(self.frames)} frames in buffer")

        try:
            # Convert frames to numpy array for VAD
@@ -193,29 +189,38 @@ class AudioChunkerSileroProcessor(AudioChunkerProcessor):
        return None

    def _frames_to_numpy(self, frames: list[av.AudioFrame]) -> Optional[np.ndarray]:
-        """Convert av.AudioFrame list to numpy array for VAD processing
-
-        Input frames are already 16kHz mono s16 format from AudioDownscaleProcessor.
-        Only need to convert s16 to float32 for Silero VAD.
-        """
+        """Convert av.AudioFrame list to numpy array for VAD processing"""
        if not frames:
            return None

        try:
-            # Concatenate all frame arrays
-            audio_arrays = [frame.to_ndarray().flatten() for frame in frames]
-            if not audio_arrays:
+            audio_data = []
+            for frame in frames:
+                frame_array = frame.to_ndarray()
+
+                if len(frame_array.shape) == 2:
+                    frame_array = frame_array.flatten()
+
+                audio_data.append(frame_array)
+
+            if not audio_data:
                return None

-            combined_audio = np.concatenate(audio_arrays)
+            combined_audio = np.concatenate(audio_data)

-            # Convert s16 to float32 (Silero VAD requires float32 in range [-1.0, 1.0])
-            # Input is guaranteed to be s16 from AudioDownscaleProcessor
-            return combined_audio.astype(np.float32) / 32768.0
+            # Ensure float32 format
+            if combined_audio.dtype == np.int16:
+                # Normalize int16 audio to float32 in range [-1.0, 1.0]
+                combined_audio = combined_audio.astype(np.float32) / 32768.0
+            elif combined_audio.dtype != np.float32:
+                combined_audio = combined_audio.astype(np.float32)
+
+            return combined_audio

        except Exception as e:
            self.logger.error(f"Error converting frames to numpy: {e}")
-            return None
+
+        return None

    def _find_speech_segment_end(self, audio_array: np.ndarray) -> Optional[int]:
        """Find complete speech segments and return frame index at segment end"""
Author	SHA1	Message	Date
Mathieu Virbel	aabf2c2572	chore(main): release 0.7.3 (#565 )	2025-08-22 16:35:52 -06:00
Mathieu Virbel	6a7b08f016	doc: change readme intro	2025-08-22 16:26:25 -06:00
Mathieu Virbel	e2736563d9	doc: update readme with new images	2025-08-22 16:15:54 -06:00
Mathieu Virbel	0f54b7782d	chore: ignore www/.env.[development,production]	2025-08-22 14:41:09 -06:00
Mathieu Virbel	359280dd34	fix: cleaned repo, and get git-leaks clean	2025-08-22 11:51:34 -06:00
				`@@ -0,0 +1 @@`
				`b9d891d3424f371642cb032ecfd0e2564470a72c:server/tests/test_transcripts_recording_deletion.py:generic-api-key:15`