mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-20 20:29:06 +00:00
Merge pull request #8 from Monadical-SAS/whisper-jax-gokul
Update README
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -162,6 +162,9 @@ cython_debug/
|
||||
*.mp4
|
||||
summary.txt
|
||||
transcript.txt
|
||||
transcript_timestamps.txt
|
||||
*.html
|
||||
*.pkl
|
||||
*.ini
|
||||
test_samples/
|
||||
*.wav
|
||||
@@ -1,47 +0,0 @@
|
||||
AGENDA: Most important things to look for in a start up
|
||||
|
||||
TAM: Make sure the market is sufficiently large than once they win they can get rewarded
|
||||
- Medium sized markets that should be winner take all can work
|
||||
- TAM needs to be realistic of direct market size
|
||||
|
||||
Product market fit: Being in a good market with a product than can satisfy that market
|
||||
- Solves a problem
|
||||
- Builds a solution a customer wants to buy
|
||||
- Either saves the customer something (time/money/pain) or gives them something (revenue/enjoyment)
|
||||
|
||||
Unit economics: Profit for delivering all-in cost must be attractive (% or $ amount)
|
||||
- Revenue minus direct costs
|
||||
- Raw input costs (materials, variable labour), direct cost of delivering and servicing the sale
|
||||
- Attractive as a % of sales so it can contribute to fixed overhead
|
||||
- Look for high incremental contribution margin
|
||||
|
||||
LTV CAC: Life-time value (revenue contribution) vs cost to acquire customer must be healthy
|
||||
- LTV = Purchase value x number of purchases x customer lifespan
|
||||
- CAC = All-in costs of sales + marketing over number of new customer additions
|
||||
- Strong reputation leads to referrals leads to lower CAC. Want customers evangelizing product/service
|
||||
- Rule of thumb higher than 3
|
||||
|
||||
Churn: Fits into LTV, low churn leads to higher LTV and helps keep future CAC down
|
||||
- Selling to replenish revenue every year is hard
|
||||
- Can run through entire customer base over time
|
||||
- Low churn builds strong net dollar retention
|
||||
|
||||
Business: Must have sufficient barriers to entry to ward off copy-cats once established
|
||||
- High switching costs (lock-in)
|
||||
- Addictive
|
||||
- Steep learning curve once adopted (form of switching cost)
|
||||
- Two sided liquidity
|
||||
- Patents, IP, Branding
|
||||
- No hyper-scaler who can roll over you quickly
|
||||
- Scale could be a barrier to entry but works against most start-ups, not for them
|
||||
- Once developed, answer question: Could a well funded competitor starting up today easily duplicate this business or is it cheaper to buy the start up?
|
||||
|
||||
Founders: Must be religious about their product. Believe they will change the world against all odds.
|
||||
- Just money in the bank is not enough to build a successful company. Just good tech not enough
|
||||
to build a successful company
|
||||
- Founders must be motivated to build something, not (all) about money. They would be doing
|
||||
this for free because they believe in it. Not looking for quick score
|
||||
- Founders must be persuasive. They will be asking others to sacrifice to make their dream come
|
||||
to life. They will need to convince investors this company can work and deserves funding.
|
||||
- Must understand who the customer is and what problem they are helping to solve.
|
||||
- Founders aren’t expected to know all the preceding points in this document but have an understanding of most of this, and be able to offer a vision.
|
||||
56
README.md
56
README.md
@@ -6,7 +6,7 @@ The target deliverable is a local-first live transcription and visualization too
|
||||
|
||||
To setup,
|
||||
|
||||
1) Check values in config.ini file. Specifically add your OPENAI_APIKEY.
|
||||
1) Check values in config.ini file. Specifically add your OPENAI_APIKEY if you plan to use OpenAI API requests.
|
||||
2) Run ``` export KMP_DUPLICATE_LIB_OK=True``` in Terminal. [This is taken care of in code, but not reflecting, Will fix this issue later.]
|
||||
3) Run the script setup_depedencies.sh.
|
||||
|
||||
@@ -31,9 +31,63 @@ To setup,
|
||||
|
||||
``` python3 whisjax.py "https://www.youtube.com/watch?v=ihf0S97oxuQ" --transcript transcript.txt summary.txt ```
|
||||
|
||||
You can even run it on local file or a file in your configured S3 bucket
|
||||
|
||||
``` python3 whisjax.py "startup.mp4" --transcript transcript.txt summary.txt ```
|
||||
|
||||
The script will take care of a few cases like youtube file, local file, video file, audio-only file,
|
||||
file in S3, etc.
|
||||
|
||||
5) ``` pip install -r requirements.txt```
|
||||
|
||||
|
||||
**S3 bucket:**
|
||||
|
||||
S3 bucket name is mentioned in config.ini. All transfers will happen between this bucket and the local computer where the
|
||||
script is run. You need AWS_ACCESS_KEY / AWS_SECRET_KEY to authenticate your calls to S3 (config.ini).
|
||||
|
||||
For AWS S3 Web UI,
|
||||
1) Login to AWS management console.
|
||||
2) Search for S3 in the search bar at the top.
|
||||
3) Navigate to list buckets, if needed and choose your bucket (reflector-bucket)
|
||||
4) You should be able to see items in the bucket. You can upload/download here.
|
||||
|
||||
Through CLI,
|
||||
Refer to the FILE UTIL section below.
|
||||
|
||||
|
||||
**FILE UTIL MDOULE:**
|
||||
|
||||
A file_util module has been created to upload/download files with AWS S3 bucket pre-configured using config.ini.
|
||||
If you need to upload / download file, separately on your own, apart from the pipeline workflow in the script,
|
||||
you can do so by :
|
||||
|
||||
Upload:
|
||||
|
||||
``` python3 file_util.py upload <object_name_in_S3_bucket>```
|
||||
|
||||
Download:
|
||||
|
||||
``` python3 file_util.py download <object_name_in_S3_bucket>```
|
||||
|
||||
|
||||
**WORKFLOW:**
|
||||
|
||||
1) Specify the input source file from local, youtube link or upload to S3 if needed and pass it as an input to the script.
|
||||
2) Keep the agenda header topics in a local file named "agenda-headers.txt". This needs to be present where the script is run.
|
||||
3) Run the script. The script automatically creates a scatter plot of words and topics in the form of an interactive
|
||||
HTML file, a sample word cloud and uploads them to the S3 bucket
|
||||
4) Additional artefacts pushed to S3:
|
||||
1) HTML visualiztion file
|
||||
2) pandas df in pickle format for others to colloborate and make their own visualizations
|
||||
3) Summary, transcript and transcript with timestamps file in txt format.
|
||||
|
||||
The script also creates 2 types of mappings.
|
||||
1) Timestamp -> The top 2 matched agenda topic
|
||||
2) Topic -> All matched timestamps in the transcription
|
||||
|
||||
Further visualizations can be planned based on available artefacts or new ones can be created.
|
||||
|
||||
|
||||
NEXT STEPS:
|
||||
|
||||
|
||||
@@ -2,9 +2,9 @@
|
||||
# Set exception rule for OpenMP error to allow duplicate lib initialization
|
||||
KMP_DUPLICATE_LIB_OK=TRUE
|
||||
# Export OpenAI API Key
|
||||
OPENAI_APIKEY=***REMOVED***
|
||||
OPENAI_APIKEY=
|
||||
# Export Whisper Model Size
|
||||
WHISPER_MODEL_SIZE=tiny
|
||||
AWS_ACCESS_KEY=
|
||||
AWS_SECRET_KEY=
|
||||
AWS_ACCESS_KEY=***REMOVED***
|
||||
AWS_SECRET_KEY=***REMOVED***
|
||||
BUCKET_NAME='reflector-bucket'
|
||||
Reference in New Issue
Block a user