Base model facebook/opt-2.7b
Fine-tuned for causal language modeling of transcribed spoken dialogue from the TalkBank CABank collection. Training corpora include:
- CABNC - Spoken language segment of the British National Corpus
- CallFriend English (N) - Phone calls
- CallFriend English (S) - Phone calls
- CallHome English - Phone calls
- GCSAusE - Australian conversations
- ISL - Conversations recorded to test ASR methods for meeting
- MICASE - Michigan Corpus of Academic Spoken English
- SCoSE - The Saarbrücken Corpus of Spoken (American) English.
(Corpus descriptions are from TalkBank)
Data input format: The data format models a sequence of spoken dialogue between two or more participants:
- The sequence is prefixed with information about the participants including name (can be a proper noun, a title/role, or unknown), age (can be a number or unknown), and sex (can be male, female, other, unknown).
- It then proceeds to sequentially list all utterances in the conversation, each prefixed with their participant code (S1, S2, S3, etc.).
- Utterances support a limited set of transcription notations in the CHAT & CHAT-CA formats:
(.)for a generic short pause, or
(N.N)for a timed pause. For example
(3.4)is a pause for 3.4 seconds.
- Non-verbal sounds:
&=click, etc. Anything describing a speaker-produced non-verbal sound can come after a prefix of
- Comments about speaker or setting:
[% baby crying in background],
[% phone clicking noise],
[% imitating him], etc. Anything describing the state of the speaker or environment can be in this block. Also, a comment block can be used to describe speaker-produced sounds, but it is more common to use the
&=prefix for that.
- Unknown or unintelligible utterances:
<participant> S1 (name: Dave, age: 33, sex: male) <participant> S2 (name: unknown, age: unknown, sex: unknown) <dialog> S1: Hi! (2.3) are you there? S2: hhh hhh [% background noise] uh yeah (0.8) I can hear you. (1.2) &=cough can you hear me? S1: ...
Per the OPT documentation, the model was trained with tokenizer setting
To use this model for real-time inference in a continuous duplex dialogue system, see: https://github.com/AbrahamSanders/realtime-chatbot.
- Downloads last month