File size: 4,865 Bytes
44246dc
 
79c7c89
 
b5069db
d402e88
22e5d80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b450c82
 
 
22e5d80
b450c82
7dbd128
b450c82
7dbd128
 
b450c82
 
7dbd128
b450c82
7dbd128
 
ec7dd7f
44246dc
79c7c89
579aaa7
79c7c89
ad78659
79c7c89
7dbd128
579aaa7
72dba0a
 
 
 
 
79c7c89
 
 
 
5bd7699
28a2355
a7a9d3f
79c7c89
5e8d9a8
79c7c89
a7a9d3f
79c7c89
 
 
 
 
a7a9d3f
79c7c89
 
a7a9d3f
79c7c89
 
 
 
a7a9d3f
79c7c89
 
 
a7a9d3f
79c7c89
b5069db
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: apache-2.0
language:
- en
pipeline_tag: fill-mask
widget:
- text: >-
    The Standard Model (SM) of [MASK] physics has been tested by many
    experiments over the last four decades and has been shown to successfully
    describe high energy particle interactions.
  example_title: particle physics
- text: >-
    Clear evidence for the production of a neutral boson with a measured mass of
    [MASK].0 ± 0.4 (stat) ± 0.4 (sys) GeV is presented.
  example_title: 126.0 ± 0.4 (stat) ± 0.4 (sys) GeV
- text: >-
    An excess of [MASK] is observed above the expected background, with a local
    significance of 5.0 standard deviations, at a mass near 125 GeV, signalling
    the production of a new particle.
  example_title: excess of events
- text: >-
    On September 14, 2015 at 09:50:45 UTC the two [MASK] of the Laser
    Interferometer Gravitational-Wave Observatory simultaneously observed a
    transient gravitational-wave signal.
  example_title: two detectors
- text: >-
    These first images from the EHT achieve the highest [MASK] resolution in the
    history of ground-based VLBI.
  example_title: angular resolution
- text: >-
    We propose a comprehensive theory of [MASK] matter that explains the recent
    proliferation of unexpected observations in high-energy astrophysics.
  example_title: dark matter
- text: >-
    Formation of galaxy clusters corresponds to the collapse of the largest
    gravitationally bound overdensities in the initial [MASK] field and is
    accompanied by the most energetic phenomena since the Big Bang and by the
    complex interplay between gravity-induced dynamics of collapse and baryonic
    processes associated with galaxy formation.
  example_title: initial density field
- text: >-
    The Event [MASK] Telescope (EHT) has led to the first images of a
    supermassive black hole, revealing the central compact objects in the
    elliptical galaxy M87 and the Milky Way.
  example_title: Event Horizon Telescope
datasets:
- wikipedia
- bookcorpus
- arnosimons/astro-hep-corpus
tags:
- arXiv
- astrophysics
- conceptual analysis
- epistemic change
- high-energy physics (HEP)
- history of science
- semantic shift detection
- sociology of science
- philosophy of science
- physics
- word embeddings
---

# Model Card for Astro-HEP-BERT

**Astro-HEP-BERT** is a bidirectional transformer designed primarily to generate contextualized word embeddings for computational conceptual analysis in astrophysics and high-energy physics (HEP). Built upon Google's `bert-base-uncased`, the model underwent additional training for three epochs using the <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/datasets/arnosimons/astro-hep-corpus">Astro-HEP Corpus</a>, containing 21.84 million paragraphs found in more than 600,000 scholarly articles sourced from arXiv, all pertaining to astrophysics and/or high-energy physics (HEP). The sole training objective was masked language modeling.

The Astro-HEP-BERT project demonstrates the general feasibility of training a customized bidirectional transformer for computational conceptual analysis in the history, philosophy, and sociology of science as an open-source endeavor that does not require a substantial budget. Leveraging only freely available code, weights, and text inputs, the entire training process was conducted on a single MacBook Pro Laptop (M2/96GB).

For further insights into the model, the corpus, and the underlying research project (<a target="_blank" rel="noopener noreferrer" href="https://doi.org/10.3030/101044932" >Network Epistemology in Practice</a>) please refer to the following two papers:

1) <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2411.14877">Simons, A. (2024). Astro-HEP-BERT: A bidirectional language model for studying the meanings of concepts in astrophysics and high energy physics. arXiv:2411.14877.</a>

2) <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2411.14073">Simons, A. (2024). Meaning at the planck scale? Contextualized word embeddings for doing history, philosophy, and sociology of science. arXiv:2411.14073.</a>


## Model Details

- **Developer:** <a target="_blank" rel="noopener noreferrer" href="https://www.tu.berlin/en/hps-mod-sci/arno-simons">Arno Simons</a>
- **Funded by:** The European Union under Grant agreement ID: <a target="_blank" rel="noopener noreferrer" href="https://doi.org/10.3030/101044932" >101044932</a>
- **Language (NLP):** English
- **License:** apache-2.0
- **Parent model:** Google's <a target="_blank" rel="noopener noreferrer" href="https://github.com/google-research/bert">`bert-base-uncased`</a>

<!---

## How to Get Started with the Model

Use the code below to get started with the model.

[Coming soon]


## Citation


**BibTeX:**

[Coming soon]

**APA:**

[Coming soon]

-->