mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-20 20:29:06 +00:00
Merge pull request #8 from Monadical-SAS/whisper-jax-gokul
Update README
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -162,6 +162,9 @@ cython_debug/
|
|||||||
*.mp4
|
*.mp4
|
||||||
summary.txt
|
summary.txt
|
||||||
transcript.txt
|
transcript.txt
|
||||||
|
transcript_timestamps.txt
|
||||||
|
*.html
|
||||||
|
*.pkl
|
||||||
*.ini
|
*.ini
|
||||||
test_samples/
|
test_samples/
|
||||||
*.wav
|
*.wav
|
||||||
@@ -1,47 +0,0 @@
|
|||||||
AGENDA: Most important things to look for in a start up
|
|
||||||
|
|
||||||
TAM: Make sure the market is sufficiently large than once they win they can get rewarded
|
|
||||||
- Medium sized markets that should be winner take all can work
|
|
||||||
- TAM needs to be realistic of direct market size
|
|
||||||
|
|
||||||
Product market fit: Being in a good market with a product than can satisfy that market
|
|
||||||
- Solves a problem
|
|
||||||
- Builds a solution a customer wants to buy
|
|
||||||
- Either saves the customer something (time/money/pain) or gives them something (revenue/enjoyment)
|
|
||||||
|
|
||||||
Unit economics: Profit for delivering all-in cost must be attractive (% or $ amount)
|
|
||||||
- Revenue minus direct costs
|
|
||||||
- Raw input costs (materials, variable labour), direct cost of delivering and servicing the sale
|
|
||||||
- Attractive as a % of sales so it can contribute to fixed overhead
|
|
||||||
- Look for high incremental contribution margin
|
|
||||||
|
|
||||||
LTV CAC: Life-time value (revenue contribution) vs cost to acquire customer must be healthy
|
|
||||||
- LTV = Purchase value x number of purchases x customer lifespan
|
|
||||||
- CAC = All-in costs of sales + marketing over number of new customer additions
|
|
||||||
- Strong reputation leads to referrals leads to lower CAC. Want customers evangelizing product/service
|
|
||||||
- Rule of thumb higher than 3
|
|
||||||
|
|
||||||
Churn: Fits into LTV, low churn leads to higher LTV and helps keep future CAC down
|
|
||||||
- Selling to replenish revenue every year is hard
|
|
||||||
- Can run through entire customer base over time
|
|
||||||
- Low churn builds strong net dollar retention
|
|
||||||
|
|
||||||
Business: Must have sufficient barriers to entry to ward off copy-cats once established
|
|
||||||
- High switching costs (lock-in)
|
|
||||||
- Addictive
|
|
||||||
- Steep learning curve once adopted (form of switching cost)
|
|
||||||
- Two sided liquidity
|
|
||||||
- Patents, IP, Branding
|
|
||||||
- No hyper-scaler who can roll over you quickly
|
|
||||||
- Scale could be a barrier to entry but works against most start-ups, not for them
|
|
||||||
- Once developed, answer question: Could a well funded competitor starting up today easily duplicate this business or is it cheaper to buy the start up?
|
|
||||||
|
|
||||||
Founders: Must be religious about their product. Believe they will change the world against all odds.
|
|
||||||
- Just money in the bank is not enough to build a successful company. Just good tech not enough
|
|
||||||
to build a successful company
|
|
||||||
- Founders must be motivated to build something, not (all) about money. They would be doing
|
|
||||||
this for free because they believe in it. Not looking for quick score
|
|
||||||
- Founders must be persuasive. They will be asking others to sacrifice to make their dream come
|
|
||||||
to life. They will need to convince investors this company can work and deserves funding.
|
|
||||||
- Must understand who the customer is and what problem they are helping to solve.
|
|
||||||
- Founders aren’t expected to know all the preceding points in this document but have an understanding of most of this, and be able to offer a vision.
|
|
||||||
56
README.md
56
README.md
@@ -6,7 +6,7 @@ The target deliverable is a local-first live transcription and visualization too
|
|||||||
|
|
||||||
To setup,
|
To setup,
|
||||||
|
|
||||||
1) Check values in config.ini file. Specifically add your OPENAI_APIKEY.
|
1) Check values in config.ini file. Specifically add your OPENAI_APIKEY if you plan to use OpenAI API requests.
|
||||||
2) Run ``` export KMP_DUPLICATE_LIB_OK=True``` in Terminal. [This is taken care of in code, but not reflecting, Will fix this issue later.]
|
2) Run ``` export KMP_DUPLICATE_LIB_OK=True``` in Terminal. [This is taken care of in code, but not reflecting, Will fix this issue later.]
|
||||||
3) Run the script setup_depedencies.sh.
|
3) Run the script setup_depedencies.sh.
|
||||||
|
|
||||||
@@ -31,9 +31,63 @@ To setup,
|
|||||||
|
|
||||||
``` python3 whisjax.py "https://www.youtube.com/watch?v=ihf0S97oxuQ" --transcript transcript.txt summary.txt ```
|
``` python3 whisjax.py "https://www.youtube.com/watch?v=ihf0S97oxuQ" --transcript transcript.txt summary.txt ```
|
||||||
|
|
||||||
|
You can even run it on local file or a file in your configured S3 bucket
|
||||||
|
|
||||||
|
``` python3 whisjax.py "startup.mp4" --transcript transcript.txt summary.txt ```
|
||||||
|
|
||||||
|
The script will take care of a few cases like youtube file, local file, video file, audio-only file,
|
||||||
|
file in S3, etc.
|
||||||
|
|
||||||
5) ``` pip install -r requirements.txt```
|
5) ``` pip install -r requirements.txt```
|
||||||
|
|
||||||
|
|
||||||
|
**S3 bucket:**
|
||||||
|
|
||||||
|
S3 bucket name is mentioned in config.ini. All transfers will happen between this bucket and the local computer where the
|
||||||
|
script is run. You need AWS_ACCESS_KEY / AWS_SECRET_KEY to authenticate your calls to S3 (config.ini).
|
||||||
|
|
||||||
|
For AWS S3 Web UI,
|
||||||
|
1) Login to AWS management console.
|
||||||
|
2) Search for S3 in the search bar at the top.
|
||||||
|
3) Navigate to list buckets, if needed and choose your bucket (reflector-bucket)
|
||||||
|
4) You should be able to see items in the bucket. You can upload/download here.
|
||||||
|
|
||||||
|
Through CLI,
|
||||||
|
Refer to the FILE UTIL section below.
|
||||||
|
|
||||||
|
|
||||||
|
**FILE UTIL MDOULE:**
|
||||||
|
|
||||||
|
A file_util module has been created to upload/download files with AWS S3 bucket pre-configured using config.ini.
|
||||||
|
If you need to upload / download file, separately on your own, apart from the pipeline workflow in the script,
|
||||||
|
you can do so by :
|
||||||
|
|
||||||
|
Upload:
|
||||||
|
|
||||||
|
``` python3 file_util.py upload <object_name_in_S3_bucket>```
|
||||||
|
|
||||||
|
Download:
|
||||||
|
|
||||||
|
``` python3 file_util.py download <object_name_in_S3_bucket>```
|
||||||
|
|
||||||
|
|
||||||
|
**WORKFLOW:**
|
||||||
|
|
||||||
|
1) Specify the input source file from local, youtube link or upload to S3 if needed and pass it as an input to the script.
|
||||||
|
2) Keep the agenda header topics in a local file named "agenda-headers.txt". This needs to be present where the script is run.
|
||||||
|
3) Run the script. The script automatically creates a scatter plot of words and topics in the form of an interactive
|
||||||
|
HTML file, a sample word cloud and uploads them to the S3 bucket
|
||||||
|
4) Additional artefacts pushed to S3:
|
||||||
|
1) HTML visualiztion file
|
||||||
|
2) pandas df in pickle format for others to colloborate and make their own visualizations
|
||||||
|
3) Summary, transcript and transcript with timestamps file in txt format.
|
||||||
|
|
||||||
|
The script also creates 2 types of mappings.
|
||||||
|
1) Timestamp -> The top 2 matched agenda topic
|
||||||
|
2) Topic -> All matched timestamps in the transcription
|
||||||
|
|
||||||
|
Further visualizations can be planned based on available artefacts or new ones can be created.
|
||||||
|
|
||||||
|
|
||||||
NEXT STEPS:
|
NEXT STEPS:
|
||||||
|
|
||||||
|
|||||||
@@ -2,9 +2,9 @@
|
|||||||
# Set exception rule for OpenMP error to allow duplicate lib initialization
|
# Set exception rule for OpenMP error to allow duplicate lib initialization
|
||||||
KMP_DUPLICATE_LIB_OK=TRUE
|
KMP_DUPLICATE_LIB_OK=TRUE
|
||||||
# Export OpenAI API Key
|
# Export OpenAI API Key
|
||||||
OPENAI_APIKEY=***REMOVED***
|
OPENAI_APIKEY=
|
||||||
# Export Whisper Model Size
|
# Export Whisper Model Size
|
||||||
WHISPER_MODEL_SIZE=tiny
|
WHISPER_MODEL_SIZE=tiny
|
||||||
AWS_ACCESS_KEY=
|
AWS_ACCESS_KEY=***REMOVED***
|
||||||
AWS_SECRET_KEY=
|
AWS_SECRET_KEY=***REMOVED***
|
||||||
BUCKET_NAME='reflector-bucket'
|
BUCKET_NAME='reflector-bucket'
|
||||||
Reference in New Issue
Block a user