The Knowledge Engineering Group (KEG) & Data Mining (THUDM) at Tsinghua University.
We build LLMs:
- ChatGLM: Open Bilingual Chat LLMs, among which the ChatGLM-6B series has attracted 10,000,000 downloads on HF.
- CodeGeeX: A Multilingual Code Generation Model (KDD 2023)
- CogVLM (VisualGLM): An Open Visual Language Model
- WebGLM: An Efficient Web-Enhanced Question Answering System (KDD 2023)
- GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
- CogView: An Open Text-to-Image Generation Model (NeurIPS 2021)
- CogVideo: An Open Text-to-Video Generation Model (ICLR 2023)
- AgentTuning: Enabling Generalized Agent Abilities for LLMs
We also work on LLM evaluations:
- AgentBench: A Benchmark to Evaluate LLMs as Agents
- LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
We also pre-train graph neural networks:
- CogDL: A Library for Graph Deep Learning (WWW 2023)
- GraphMAE: (Generative) Masked Graph Neural Network Pre-Training. (KDD 2022 & WWW 2023)
- GPT-GNN: Generative Graph Neural Network Pre-Training (KDD 2020, MSR, UCLA).
- GCC: Constrative Graph Neural Network Pre-Training (KDD 2020)
- SelfKG: Self-Supervised Learning for Knowledge Graphs (WWW 2022)
We also work on graph embedding theory, algorithms, and systems:
- SketchNE: Embedding Billion-Scale Networks Accurately in One Hour (TKDE 2023)
- ProNE: Embedding Networks of 100 Million Nodes with 10-400 Speedup (IJCAI 2019)
- NetSMF: Embedding Networks of 100 Million Nodes (WWW 2019)
- NetMF: Understanding DeepWalk, LINE, PTE, and node2vec as Matrix Factorization (WSDM 2018)
We started with social networks and graphs, and always love them:
- AMiner: An Academic Search and Mining System Since 2006 (KDD 2008, ACM SIGKDD Test of Time Award)