Update README

2026-02-04 09:56:47 +00:00 · 2023-08-24 19:08:53 +07:00
parent 196aa8454f
commit fcd98e9fd7
3 changed files with 130 additions and 332 deletions
--- a/README.md
+++ b/README.md
@@ -1,23 +1,144 @@
 # Reflector
-Reflector server is responsible for audio transcription and summarization for now.
+Reflector is a cutting-edge web application under development by Monadical. It utilizes AI to record meetings, providing a permanent record with transcripts, translations, and automated summaries.
 _The project is moving fast, documentation is currently unstable and outdated_
-## Server
+The project architecture consists of three primary components:
-We currently use oogabooga as a LLM backend.
+* **Front-End**: NextJS React project hosted on Vercel, located in `www/`.
 * **Back-End**: Python server that offers an API and data persistence, found in `server/`.
 * **AI Models**: Providing services such as speech-to-text transcription, topic generation, automated summaries, and translations.
-### Using docker
+![Project Architecture](ProjectArchitecture.jpg)
-Create a `.env` with
+## Table of Contents
 - [Reflector](#reflector)
  - [Table of Contents](#table-of-contents)
  - [Miscellaneous](#miscellaneous)
    - [Contribution Guidelines](#contribution-guidelines)
    - [How to Install Blackhole (Mac Only)](#how-to-install-blackhole-mac-only)
  - [Front-End](#front-end)
    - [Installation](#installation)
    - [Run the Application](#run-the-application)
    - [OpenAPI Code Generation](#openapi-code-generation)
  - [Back-End](#back-end)
    - [Installation](#installation-1)
    - [Start the project](#start-the-project)
      - [Using docker](#using-docker)
    - [Using local GPT4All](#using-local-gpt4all)
    - [Using local files](#using-local-files)
  - [AI Models](#ai-models)
 ## Miscellaneous
 ### Contribution Guidelines
 All new contributions should be made in a separate branch. Before any code is merged into `master`, it requires a code review.
 ### How to Install Blackhole (Mac Only)
 Note: We currently do not have instructions for Windows users.
 * Install [Blackhole](https://github.com/ExistentialAudio/BlackHole)-2ch (2 ch is enough) by 1 of 2 options listed.
 * Setup ["Aggregate device"](https://github.com/ExistentialAudio/BlackHole/wiki/Aggregate-Device) to route web audio and local microphone input.
 * Setup [Multi-Output device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device)
 * Then goto ```System Preferences -> Sound``` and choose the devices created from the Output and Input tabs.
 * The input from your local microphone, the browser run meeting should be aggregated into one virtual stream to listen to and the output should be fed back to your specified output devices if everything is configured properly.
 Permissions:
 You may have to add permission for browser's microphone access to record audio in
 ```System Preferences -> Privacy & Security -> Microphone```
 ```System Preferences -> Privacy & Security -> Accessibility```. You will be prompted to provide these when you try to connect.
 ## Front-End
 ### Installation
 To install the application, run:
 ```bash
 yarn install
 ```
 ### Run the Application
 To run the application in development mode, run:
 ```bash
 yarn dev
 ```
 Then open [http://localhost:3000](http://localhost:3000) to view it in the browser.
 ### OpenAPI Code Generation
 To generate the TypeScript files from the openapi.json file, make sure the python server is running, then run:
 ```bash
 yarn openapi
 ```
 You may need to run `yarn global add @openapitools/openapi-generator-cli` first. You also need a Java runtime installed on your machine.
 ## Back-End
 ### Installation
 Run:
 ```bash
 poetry install
 ```
 Then create an `.env` with:
 ```
 TRANSCRIPT_BACKEND=modal
 TRANSCRIPT_URL=https://monadical-sas--reflector-transcriber-web.modal.run
 TRANSCRIPT_MODAL_API_KEY=<omitted>
 LLM_BACKEND=modal
 LLM_URL=https://monadical-sas--reflector-llm-web.modal.run
 LLM_MODAL_API_KEY=<omitted>
 AUTH_BACKEND=fief
 AUTH_FIEF_URL=https://auth.reflector.media/reflector-local
 AUTH_FIEF_CLIENT_ID=KQzRsNgoY<omitted>
 AUTH_FIEF_CLIENT_SECRET=<omitted>
 LLM_URL=http://IP:HOST/api/v1/generate
 ```
-Then start with:
+### Start the project
-```
+Use:
-$ docker-compose up
+
 ```bash
 poetry run python3 -m reflector.app
 ```
 #### Using docker
 Use:
 ```bash
 docker-compose up server
 ```
 ### Using local GPT4All
 - Start GPT4All with any model you want
 - Ensure the API server is activated in GPT4all
 - Run with: `LLM_BACKEND=openai LLM_URL=http://localhost:4891/v1/completions LLM_OPENAI_MODEL="GPT4All Falcon" python -m reflector.app`
 ### Using local files
 ```
 poetry run python -m reflector.tools.process path/to/audio.wav
 ```
 ## AI Models
 *(Documentation for this section is pending.)*
--- a/server/README.md
+++ b/server/README.md
@@ -1,234 +0,0 @@
 # Reflector
 Reflector server is responsible for audio transcription and summarization for now.
 _The project is moving fast, documentation is currently unstable and outdated_
 ## Server
 We currently use oogabooga as a LLM backend.
 ### Using docker
 Create a `.env` with
 ```
 LLM_URL=http://IP:HOST/api/v1/generate
 ```
 Then start with:
 ```
 $ docker-compose up server
 ```
 ### Using local environment
 Install the dependencies with poetry:
 ```
 $ poetry install
 ```
 Then run the server:
 ```
 # With a config.ini
 $ poetry run python -m reflector.app
 # Within a poetry env
 $ poetry shell
 $ LLM_URL=http://.../api/v1/generate python -m reflector.app
 ```
 ### Using local GPT4All
 - Start GPT4All with any model you want
 - Ensure the API server is activated in GPT4all
 - Run with: `LLM_BACKEND=openai LLM_URL=http://localhost:4891/v1/completions LLM_OPENAI_MODEL="GPT4All Falcon" python -m reflector.app`
 ### Using local files
 ```
 poetry run python -m reflector.tools.process path/to/audio.wav
 ```
 # Old documentation
 This is the code base for the Reflector demo (formerly called agenda-talk-diff) for the leads : Troy Web Consulting
 panel (A Chat with AWS about AI: Real AI/ML AWS projects and what you should know) on 6/14 at 430PM.
 The target deliverable is a local-first live transcription and visualization tool to compare a discussion's target
 agenda/objectives to the actual discussion live.
 **S3 bucket:**
 Everything you need for S3 is already configured in config.ini. Only edit it if you need to change it deliberately.
 S3 bucket name is mentioned in config.ini. All transfers will happen between this bucket and the local computer where
 the
 script is run. You need AWS_ACCESS_KEY / AWS_SECRET_KEY to authenticate your calls to S3 (done in config.ini).
 For AWS S3 Web UI,
 1. Login to AWS management console.
 2. Search for S3 in the search bar at the top.
 3. Navigate to list the buckets under the current account, if needed and choose your bucket [```reflector-bucket```]
 4. You should be able to see items in the bucket. You can upload/download files here directly.
 For CLI,
 Refer to the FILE UTIL section below.
 **FILE UTIL MODULE:**
 A file_util module has been created to upload/download files with AWS S3 bucket pre-configured using config.ini.
 Though not needed for the workflow, if you need to upload / download file, separately on your own, apart from the
 pipeline workflow in the script, you can do so by :
 Upload:
 ` python3 file_util.py upload <object_name_in_S3_bucket>`
 Download:
 ` python3 file_util.py download <object_name_in_S3_bucket>`
 If you want to access the S3 artefacts, from another machine, you can either use the python file_util with the commands
 mentioned above or simply use the GUI of AWS Management Console.
 To setup,
 1. Check values in config.ini file. Specifically add your OPENAI_APIKEY if you plan to use OpenAI API requests.
 2. Run ` export KMP_DUPLICATE_LIB_OK=True` in
   Terminal. [This is taken care of in code, but not reflecting, Will fix this issue later.]
 NOTE: If you don't have portaudio installed already, run `brew install portaudio`
 3. Run the script setup_depedencies.sh.
   `chmod +x setup_dependencies.sh`
   ` sh setup_dependencies.sh  <ENV>`
 ENV refers to the intended environment for JAX. JAX is available in several
 variants, [CPU | GPU | Colab TPU | Google Cloud TPU]
 `ENV` is :
 cpu -> JAX CPU installation
 cuda11 -> JAX CUDA 11.x version
 cuda12 -> JAX CUDA 12.x version (Core Weave has CUDA 12 version, can check with `nvidia-smi`)
    ```sh setup_dependencies.sh cuda12```
 4. If not already done, install ffmpeg. `brew install ffmpeg`
 For NLTK SSL error,
 check [here](https://stackoverflow.com/questions/38916452/nltk-download-ssl-certificate-verify-failed)
 5. Run the Whisper-JAX pipeline. Currently, the repo can take a Youtube video and transcribes/summarizes it.
 ` python3 whisjax.py "https://www.youtube.com/watch?v=ihf0S97oxuQ"`
 You can even run it on local file or a file in your configured S3 bucket.
 ` python3 whisjax.py "startup.mp4"`
 The script will take care of a few cases like youtube file, local file, video file, audio-only file,
 file in S3, etc. If local file is not present, it can automatically take the file from S3.
 **OFFLINE WORKFLOW:**
 1. Specify the input source file] from a local, youtube link or upload to S3 if needed and pass it as input to the
   script.If the source file is in
   `.m4a` format, it will get converted to `.mp4` automatically.
 2. Keep the agenda header topics in a local file named `agenda-headers.txt`. This needs to be present where the
   script is run.
   This version of the pipeline compares covered agenda topics using agenda headers in the following format.
   1. `agenda_topic : <short description>`
 3. Check all the values in `config.ini`. You need to predefine 2 categories for which you need to scatter plot the
   topic modelling visualization in the config file. This is the default visualization. But, from the dataframe artefact
   called
   `df_<timestamp>.pkl` , you can load the df and choose different topics to plot. You can filter using certain
   words to search for the
   transcriptions and you can see the top influencers and characteristic in each topic we have chosen to plot in the
   interactive HTML document. I have added a new jupyter notebook that gives the base template to play around with,
   named
   `Viz_experiments.ipynb`.
 4. Run the script. The script automatically transcribes, summarizes and creates a scatter plot of words & topics in the
   form of an interactive
   HTML file, a sample word cloud and uploads them to the S3 bucket
 5. Additional artefacts pushed to S3:
   1. HTML visualization file
   2. pandas df in pickle format for others to collaborate and make their own visualizations
   3. Summary, transcript and transcript with timestamps file in text format.
   The script also creates 2 types of mappings.
   1. Timestamp -> The top 2 matched agenda topic
   2. Topic -> All matched timestamps in the transcription
 Other visualizations can be planned based on available artefacts or new ones can be created. Refer the
 section `Viz-experiments`.
 **Visualization experiments:**
 This is a jupyter notebook playground with template instructions on handling the metadata and data artefacts generated
 from the
 pipeline. Follow the instructions given and tweak your own logic into it or use it as a playground to experiment
 libraries and
 visualizations on top of the metadata.
 **WHISPER-JAX REALTIME TRANSCRIPTION PIPELINE:**
 We also support a provision to perform real-time transcripton using whisper-jax pipeline. But, there are
 a few pre-requisites before you run it on your local machine. The instructions are for
 configuring on a MacOS.
 We need to way to route audio from an application opened via the browser, ex. "Whereby" and audio from your local
 microphone input which you will be using for speaking. We
 use [Blackhole](https://github.com/ExistentialAudio/BlackHole).
 1. Install Blackhole-2ch (2 ch is enough) by 1 of 2 options listed.
 2. Setup [Aggregate device](https://github.com/ExistentialAudio/BlackHole/wiki/Aggregate-Device) to route web audio and
   local microphone input.
   Be sure to mirror the settings given ![here](./images/aggregate_input.png)
 3. Setup [Multi-Output device](https://github.com/ExistentialAudio/BlackHole/wiki/Multi-Output-Device)
   Refer ![here](./images/multi-output.png)
 4. Set the aggregator input device name created in step 2 in config.ini as `BLACKHOLE_INPUT_AGGREGATOR_DEVICE_NAME`
 5. Then goto `System Preferences -> Sound` and choose the devices created from the Output and
   Input tabs.
 6. The input from your local microphone, the browser run meeting should be aggregated into one virtual stream to listen
   to
   and the output should be fed back to your specified output devices if everything is configured properly. Check this
   before trying out the trial.
 **Permissions:**
 You may have to add permission for "Terminal"/Code Editors [Pycharm/VSCode, etc.] microphone access to record audio in
 `System Preferences -> Privacy & Security -> Microphone`,
 `System Preferences -> Privacy & Security -> Accessibility`,
 `System Preferences -> Privacy & Security -> Input Monitoring`.
 From the reflector root folder,
 run `python3 whisjax_realtime.py`
 The transcription text should be written to `real_time_transcription_<timestamp>.txt`
 NEXT STEPS:
 1. Create a RunPod setup for this feature (mentioned in 1 & 2) and test it end-to-end
 2. Perform Speaker Diarization using Whisper-JAX
 3. Based on the feasibility of the above points, explore suitable visualizations for transcription & summarization.
--- a/www/README.md
+++ b/www/README.md
@@ -1,89 +0,0 @@
 # Reflector React App
 Reflector is a React application that uses WebRTC to stream audio from the browser to a server and receive live transcription and topics from the server.
 ## Table of Contents
 - [Reflector React App](#reflector-react-app)
  - [Table of Contents](#table-of-contents)
  - [Project Architecture](#project-architecture)
  - [Installation](#installation)
  - [Run the Application](#run-the-application)
  - [WebRTC Integration](#webrtc-integration)
  - [OpenAPI Code Generation](#openapi-code-generation)
  - [Contribution Guidelines](#contribution-guidelines)
 ## Project Architecture
 ![Project Architecture](ProjectArchitecture.jpg)
 ## Installation
 To install the application, run:
 ```bash
 yarn install
 ```
 ## Run the Application
 To run the application in development mode, run:
 ```bash
 yarn run dev
 ```
 Then open [http://localhost:3000](http://localhost:3000) to view it in the browser.
 ## WebRTC Integration
 The main part of the WebRTC integration is located in the `useWebRTC` hook in the `hooks/useWebRTC.js` file. This hook initiates a WebRTC connection when an audio stream is available, sends signal data to the server, and listens for data from the server.
 To connect the application with your server, you need to implement the following:
 1. **Signal Data Sending**: In the `useWebRTC` hook, when a `'signal'` event is emitted, the hook logs the signal data to the console. You should replace this logging with sending the data to the server:
 ```jsx
 peer.on("signal", (data) => {
  // This is where you send the signal data to the server.
 });
 ```
 2. **Data Receiving**: The `useWebRTC` hook listens for `'data'` event and when it is emitted, it sets the received data to the `data` state:
 ```jsx
 peer.on("data", (data) => {
  // Received data from the server.
  const serverData = JSON.parse(data.toString());
  setData(serverData);
 });
 ```
 The received data is expected to be a JSON object containing the live transcription and topics:
 ```json
 {
  "transcription": "live transcription...",
  "topics": [
    { "title": "topic 1", "description": "description 1" },
    { "title": "topic 2", "description": "description 2" }
    // ...
  ]
 }
 ```
 This data is then returned from the `useWebRTC` hook and can be used in your components.
 ## OpenAPI Code Generation
 To generate the TypeScript files from the openapi.json file, make sure the python server is running, then run:
 ```bash
 yarn openapi
 ```
 You may need to run `yarn global add @openapitools/openapi-generator-cli` first. You also need a Java runtime installed on your machine.
 ## Contribution Guidelines
 All new contributions should be made in a separate branch. Before any code is merged into `master`, it requires a code review.