medical-code-transcriber / public-apps /tag_from_library.livemd
noahsettersten's picture
feat: Perform semantic search on vector embeddings
08ea531
<!-- livebook:{"app_settings":{"auto_shutdown_ms":5000,"multi_session":true,"show_source":true,"slug":"transcriber"}} -->
# Tag Audio
```elixir
Mix.install(
[
{:audio_tagger, path: "/Users/noah/development/audio_tagger"},
{:kino_bumblebee, "~> 0.4.0"},
{:exla, ">= 0.0.0"},
{:explorer, "~> 0.7.0"},
{:kino_explorer, "~> 0.1.11"}
],
config: [
nx: [default_backend: EXLA.Backend]
# exla: [
# clients: [
# cuda: [
# platform: :cuda,
# lazy_transfers: :never
# ]
# ]
# ]
]
)
```
## Step 1: Create Vector Embeddings for ICD-10 Codes
```elixir
# Use sentence-transformers/all-MiniLM-L6-v2 to create vectors for each medical code description
tmpfile = Path.join(System.tmp_dir(), "icd10_vector_tensors.bin")
if File.exists?(tmpfile) do
IO.puts("Found pre-calculated ICD-10 vector embeddings. Skipping embedding.")
else
AudioTagger.SampleData.icd10_codes()
|> AudioTagger.Classifier.SemanticSearch.precalculate_label_vectors(tmpfile)
IO.inspect(tmpfile, label: "Wrote vector embeddings")
end
```
## Step 2: Transcribe Audio Recording
```elixir
# 1 - Prepare model and choose audio file
featurizer = AudioTagger.Transcriber.prepare_featurizer()
audio_input = Kino.Input.audio("Audio", sampling_rate: featurizer.sampling_rate)
```
```elixir
# 2 - Transcribe audio recording to text (using openai/whisper-tiny)
# Takes 5–6s for about a minute of audio
chosen_audio = Kino.Input.read(audio_input)
file = chosen_audio.file_ref |> Kino.Input.file_path() |> File.read!()
transcription_df =
AudioTagger.Transcriber.transcribe_audio(featurizer, file, chosen_audio.num_channels)
# Show a sample of rows
transcription_df |> Explorer.DataFrame.head(3)
```
## Step 3: Tag Transcribed Audio
```elixir
labels_df = AudioTagger.SampleData.icd10_codes()
tagged_audio =
transcription_df
# |> AudioTagger.Classifier.SemanticSearch.tag(labels_df, tmpfile)
|> AudioTagger.Classifier.SemanticSearch.tag(
labels_df,
"/var/folders/8g/1hrq420n22b05m6k98mxqlrr0000gn/T/icd10_vector_tensors.bin"
)
```