jemoka commited on
Commit
10785c0
β€’
1 Parent(s): 6e902cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -1,3 +1,22 @@
1
  ---
2
- license: bsd-3-clause
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  ---
5
+
6
+ # TalkBank Batchalign CHATUtterance
7
+ CHATUtterance is a series of Bert-derivative models designed for the task of Utterance Segmentation released by the TalkBank project. This is the Mandarin model, which is trained on the the utterance diarization samples given by CHILDES Mandarin corpora: [ZhouAssessment](https://childes.talkbank.org/access/Chinese/Mandarin/ZhouAssessment.html), [Zhang Personal Narrative](https://childes.talkbank.org/access/Chinese/Mandarin/ChangPN.html), [Li Shared Reading](https://childes.talkbank.org/access/Chinese/Mandarin/LiReading.html).
8
+
9
+ ## Usage
10
+ The models can be used directly as a Bert-class token classification model following the [instructions from Huggingface](https://huggingface.co/docs/transformers/tasks/token_classification). Feel free to inspect [this file](https://github.com/TalkBank/batchalign/blob/73ec04761ed3ee2eba04ba0cf14dc898f88b72f7/baln/utokengine.py#L85-L94) for a sense of what the classes means. Alternatively, to get the full analysis possible with the model, it is best combined with the TalkBank Batchalign suite of analysis software, [available here](https://github.com/talkbank/batchalign2), using `transcribe` mode.
11
+
12
+ Target labels:
13
+
14
+ - `0`: regular form
15
+ - `1`: start of utterance/capitalized word
16
+ - `2`: end of declarative utterance (end this utterance with a `.`)
17
+ - `3`: end of interrogative utterance (end this utterance with a `?`)
18
+ - `4`: end of exclamatory utterance (end this utterance with a `!`)
19
+ - `5`: break in the utterance; depending on orthography one can insert a `,`
20
+
21
+
22
+