Reflector

This is the code base for the Reflector demo (formerly called agenda-talk-diff) for the leads : Troy Web Consulting panel (A Chat with AWS about AI: Real AI/ML AWS projects and what you should know) on 6/14 at 430PM.

The target deliverable is a local-first live transcription and visualization tool to compare a discussion's target agenda/objectives to the actual discussion live.

To setup,

Check values in config.ini file. Specifically add your OPENAI_APIKEY if you plan to use OpenAI API requests.
Run export KMP_DUPLICATE_LIB_OK=True in Terminal. [This is taken care of in code, but not reflecting, Will fix this issue later.]
Run the script setup_depedencies.sh.

chmod +x setup_dependencies.sh

sh setup_dependencies.sh <ENV>

ENV refers to the intended environment for JAX. JAX is available in several variants, [CPU | GPU | Colab TPU | Google Cloud TPU]

ENV is :

cpu -> JAX CPU installation

cuda11 -> JAX CUDA 11.x version

cuda12 -> JAX CUDA 12.x version (Core Weave has CUDA 12 version, can check with nvidia-smi)

sh setup_dependencies.sh cuda12
Run the Whisper-JAX pipeline. Currently, the repo takes a Youtube video and transcribes/summarizes it.

python3 whisjax.py "https://www.youtube.com/watch?v=ihf0S97oxuQ" --transcript transcript.txt summary.txt

You can even run it on local file or a file in your configured S3 bucket

python3 whisjax.py "startup.mp4" --transcript transcript.txt summary.txt

The script will take care of a few cases like youtube file, local file, video file, audio-only file, file in S3, etc.

pip install -r requirements.txt

S3 bucket:

S3 bucket name is mentioned in config.ini. All transfers will happen between this bucket and the local computer where the script is run. You need AWS_ACCESS_KEY / AWS_SECRET_KEY to authenticate your calls to S3 (config.ini).

For AWS S3 Web UI,

Login to AWS management console.
Search for S3 in the search bar at the top.
Navigate to list buckets, if needed and choose your bucket (reflector-bucket)
You should be able to see items in the bucket. You can upload/download here.

Through CLI, Refer to the FILE UTIL section below.

FILE UTIL MDOULE:

A file_util module has been created to upload/download files with AWS S3 bucket pre-configured using config.ini. If you need to upload / download file, separately on your own, apart from the pipeline workflow in the script, you can do so by :

Upload:

python3 file_util.py upload <object_name_in_S3_bucket>

Download:

python3 file_util.py download <object_name_in_S3_bucket>

WORKFLOW:

Specify the input source file from local, youtube link or upload to S3 if needed and pass it as an input to the script.
Keep the agenda header topics in a local file named "agenda-headers.txt". This needs to be present where the script is run.
Run the script. The script automatically creates a scatter plot of words and topics in the form of an interactive HTML file, a sample word cloud and uploads them to the S3 bucket
Additional artefacts pushed to S3:
1. HTML visualiztion file
2. pandas df in pickle format for others to colloborate and make their own visualizations
3. Summary, transcript and transcript with timestamps file in txt format.
The script also creates 2 types of mappings.
1. Timestamp -> The top 2 matched agenda topic
2. Topic -> All matched timestamps in the transcription

Further visualizations can be planned based on available artefacts or new ones can be created.

NEXT STEPS:

Run this demo on a local Mac M1 to test flow and observe the performance
Create a pipeline using microphone to listen to audio chunks to perform transcription realtime (and also efficiently summarize it as well) -> done as part of whisjax_realtime_trial.py
Create a RunPod setup for this feature (mentioned in 1 & 2) and test it end-to-end
Perform Speaker Diarization using Whisper-JAX
Based on feasibility of above points, explore suitable visualizations for transcription & summarization.

4.0 KiB Raw Blame History

Reflector

4.0 KiB

Raw Blame History