Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University's profile picture

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University

university

AI & ML interests

AGI, LLMs, ChatGLM

Organization Card
About org cards

The Knowledge Engineering Group (KEG) & Data Mining (THUDM) at Tsinghua University.

We build LLMs and related training & inference techniques:

  • ChatGLM: Open Bilingual Chat LLMs, among which the ChatGLM-6B series has attracted 10,000,000 downloads on HF.
  • CodeGeeX: A Multilingual Code Generation Model (KDD 2023)
  • CogVLM (VisualGLM): An Open Visual Language Model
  • WebGLM: An Efficient Web-Enhanced Question Answering System (KDD 2023)
  • GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
  • CogView: An Open Text-to-Image Generation Model (NeurIPS 2021)
  • CogVideo: An Open Text-to-Video Generation Model (ICLR 2023)
  • CogAgent: A Visual Language Model for GUI Agents
  • AgentTuning: Enabling Generalized Agent Abilities for LLMs
  • APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding

We also work on LLM evaluations:

  • AgentBench: A Benchmark to Evaluate LLMs as Agents (ICLR 2024)
  • AlignBench: A Benchmark to Evaluate Chinese Alignment of LLMs
  • LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

We also pre-train graph neural networks:

  • CogDL: A Library for Graph Deep Learning (WWW 2023)
  • GraphMAE: (Generative) Masked Graph Neural Network Pre-Training. (KDD 2022 & WWW 2023)
  • GPT-GNN: Generative Graph Neural Network Pre-Training (KDD 2020, MSR, UCLA).
  • GCC: Constrative Graph Neural Network Pre-Training (KDD 2020)
  • SelfKG: Self-Supervised Learning for Knowledge Graphs (WWW 2022)

We also work on graph embedding theory, algorithms, and systems:

  • SketchNE: Embedding Billion-Scale Networks Accurately in One Hour (TKDE 2023)
  • ProNE: Embedding Networks of 100 Million Nodes with 10-400 Speedup (IJCAI 2019)
  • NetSMF: Embedding Networks of 100 Million Nodes (WWW 2019)
  • NetMF: Understanding DeepWalk, LINE, PTE, and node2vec as Matrix Factorization (WSDM 2018)

We started with social networks and graphs, and always love them:

  • AMiner: An Academic Search and Mining System Since 2006 (KDD 2008, ACM SIGKDD Test of Time Award)