Edit model card

词法分析的库函数介绍/Introduction

1.分词 segment
method的参数包括 中文:jieba_ac,jieba_all,hanlp,thulac,snownlp,ltp 英文:spacy,nltk,split

2.词干提取 stem
method的参数包括 porter,lancester,snowball

3.词形还原 lemmatize_text
method的参数包括 spacy,nltk

4.词性标注 tagging
method的参数包括 中文:jieba,thulac,hanlp,npir,snownlp 英文:nltk,spacy

5.命名实体识别 named_entity_recognition
method参数包括 中文:LTP(Nh 人名,Ni机构名,Ns地名),Hanlp,spacy_ch 英文:spacy_en,nltk

6.去停用词 remove_stopword

7.词频统计 count_word_frequency

在function上填入对应的功能,method里填入对应方法的method参数

需要提前安装相应的库,库的内容在require文件里

除此之外,还需要通过 python -m spacy download zh_core_web_sm 和 python -m spacy download en_core_web_sm 来安装 zh_core_web_sm==3.7.0和en_core_web_sm==3.7.1

快速开始/Quick Start

关于词法分析的库函数的使用,样例如下

from huggingface_hub import hf_hub_download
import importlib.util

# 替换为你的 Hugging Face 用户名和仓库名
def nlp(content, function, method):
    repo_id = "epetery/my-new-model"
    filename = "divide_corpus.py"
    stopwords_filename = "stopwords-master/baidu_stopwords.txt"

    # 下载文件到当前工作目录
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    stopwords_file_path = hf_hub_download(repo_id=repo_id, filename=stopwords_filename)

    # 导入模块
    spec = importlib.util.spec_from_file_location("divide_corpus", file_path)
    divide_corpus = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(divide_corpus)

    divide_corpus.STOPWORDS_FILE_PATH = stopwords_file_path

    # 使用模块中的类和方法
    text_divider = getattr(divide_corpus, "NLP_Class")(content)
    if function != 'count_word_frequency':
        divided_text = getattr(text_divider, function)(method=method)
    else:
        seg_text = getattr(text_divider, 'segment')(method=method)
        freq_counter = getattr(divide_corpus, "NLP_Class")(seg_text)
        divided_text = freq_counter.count_word_frequency()
    return divided_text

# 使用模块中的函数

text = "This is a test text."
divided_text=nlp(text,'remove_stopword','nltk')
print(divided_text)
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Space using epetery/my-new-model 1