talkbank
/

CHATUtterance-zh_CN

Token Classification

Inference Endpoints

Model card Files Files and versions Community

jemoka commited on Mar 8

Commit

10785c0

•

1 Parent(s): 6e902cb

Update README.md

Files changed (1) hide show

README.md +20 -1

README.md CHANGED Viewed

@@ -1,3 +1,22 @@
 ---
-license: bsd-3-clause
 ---

 ---
+language:
+- en
 ---
+# TalkBank Batchalign CHATUtterance
+CHATUtterance is a series of Bert-derivative models designed for the task of Utterance Segmentation released by the TalkBank project. This is the Mandarin model, which is trained on the the utterance diarization samples given by CHILDES Mandarin corpora: [ZhouAssessment](https://childes.talkbank.org/access/Chinese/Mandarin/ZhouAssessment.html), [Zhang Personal Narrative](https://childes.talkbank.org/access/Chinese/Mandarin/ChangPN.html), [Li Shared Reading](https://childes.talkbank.org/access/Chinese/Mandarin/LiReading.html).
+## Usage
+The models can be used directly as a Bert-class token classification model following the [instructions from Huggingface](https://huggingface.co/docs/transformers/tasks/token_classification). Feel free to inspect [this file](https://github.com/TalkBank/batchalign/blob/73ec04761ed3ee2eba04ba0cf14dc898f88b72f7/baln/utokengine.py#L85-L94) for a sense of what the classes means. Alternatively, to get the full analysis possible with the model, it is best combined with the TalkBank Batchalign suite of analysis software, [available here](https://github.com/talkbank/batchalign2), using `transcribe` mode.
+Target labels:
+- `0`: regular form
+- `1`: start of utterance/capitalized word
+- `2`: end of declarative utterance (end this utterance with a `.`)
+- `3`: end of interrogative utterance (end this utterance with a `?`)
+- `4`: end of exclamatory utterance (end this utterance with a `!`)
+- `5`: break in the utterance; depending on orthography one can insert a `,`