{ "cells": [ { "cell_type": "markdown", "id": "a5ace857", "metadata": {}, "source": [ "# Visualization Experiments" ] }, { "cell_type": "markdown", "id": "9bfc569d", "metadata": {}, "source": [ "Lets load the data artefacts to local memory. These files are to be downloaded from S3 as the pipeline automatically uploads them to the pre-configured S3 bucket." ] }, { "cell_type": "code", "execution_count": 18, "id": "edc584b2", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m2023-06-23 15:01:36.558\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mfile_utilities\u001b[0m:\u001b[36mdownload_files\u001b[0m:\u001b[36m36\u001b[0m - \u001b[1mDownloading file df_06-23-2023_06:10:03.pkl\u001b[0m\n", "\u001b[32m2023-06-23 15:01:38.450\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mfile_utilities\u001b[0m:\u001b[36mdownload_files\u001b[0m:\u001b[36m36\u001b[0m - \u001b[1mDownloading file mappings_06-23-2023_06:10:03.pkl\u001b[0m\n", "\u001b[32m2023-06-23 15:01:39.179\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mfile_utilities\u001b[0m:\u001b[36mdownload_files\u001b[0m:\u001b[36m36\u001b[0m - \u001b[1mDownloading file transcript_with_timestamp_06-23-2023_06:10:03.txt\u001b[0m\n" ] } ], "source": [ "from file_utilities import download_files\n", "import pickle\n", "\n", "# Download files from S3 bucket. You can download multiple files at a time by passing a list of names\n", "# files_to_download = [\"df.pkl\",\n", "# \"mappings.pkl\",\n", "# \"transcript_timestamps.txt\"]\n", "\n", "# set the timestamp \n", "timestamp = \"06-23-2023_06:10:03\"\n", "\n", "# df,mappings,transcript_timestamps file names\n", "df_file_name = \"df_\" + timestamp + \".pkl\"\n", "mappings_file_name = \"mappings_\" + timestamp + \".pkl\"\n", "transcript_file_name = \"transcript_with_timestamp_\" + timestamp + \".txt\"\n", "\n", "\n", "files_to_download = [df_file_name,\n", " mappings_file_name,\n", " transcript_file_name] \n", "download_files(files_to_download)" ] }, { "cell_type": "code", "execution_count": 19, "id": "5027fe25", "metadata": {}, "outputs": [], "source": [ "# Download spacy model for the first time\n", "import nltk\n", "import spacy\n", "from nltk.corpus import stopwords\n", "\n", "nltk.download('punkt', quiet=True)\n", "nltk.download('stopwords', quiet=True)\n", "spaCy_model = \"en_core_web_md\"\n", "nlp = spacy.load(spaCy_model)\n", "spacy_stopwords = nlp.Defaults.stop_words\n", "STOPWORDS = set(spacy_stopwords).union(set(stopwords.words('english')))" ] }, { "cell_type": "markdown", "id": "8abc435d", "metadata": {}, "source": [ "## Example template 1" ] }, { "cell_type": "markdown", "id": "2b1a4834", "metadata": {}, "source": [ "## Scatter plot of transcription with Topic modelling" ] }, { "cell_type": "code", "execution_count": 23, "id": "55a75dcf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | timestamp | \n", "text | \n", "ts_to_topic_mapping_top_1 | \n", "ts_to_topic_mapping_top_2 | \n", "
|---|---|---|---|---|
| 0 | \n", "(0.0, 12.36) | \n", "this . Okay , yeah , so it looks like I am re... | \n", "TAM | \n", "Founders | \n", "
| 1 | \n", "(12.36, 25.76) | \n", "because Goku needs that for the audio plus the... | \n", "Founders | \n", "TAM | \n", "
| 2 | \n", "(25.76, 30.32) | \n", "the rest of the team did . So I want to just ... | \n", "Founders | \n", "AGENDA | \n", "
| 3 | \n", "(30.32, 35.52) | \n", "then we can ask questions or how do you want t... | \n", "TAM | \n", "Founders | \n", "
| 4 | \n", "(35.52, 49.56) | \n", "introduction . So what I , it all started wit... | \n", "Founders | \n", "TAM | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 554 | \n", "(3323.0, 3326.56) | \n", "It 's crazy . But definitely with the | \n", "Founders | \n", "TAM | \n", "
| 555 | \n", "(3326.56, 3332.24) | \n", "local models , we have n't found a way to work... | \n", "Founders | \n", "TAM | \n", "
| 556 | \n", "(3332.24, 3337.2) | \n", "if you 'd have 90 minutes of audio to transfer... | \n", "TAM | \n", "Founders | \n", "
| 557 | \n", "(3338.32, 3344.4) | \n", "We actually have a preprocessor to resolve wha... | \n", "Founders | \n", "TAM | \n", "
| 558 | \n", "(3344.4, None) | \n", "there 's still some struggles on the local mod... | \n", "Founders | \n", "TAM | \n", "
559 rows × 4 columns
\n", "