Spaces:
Sleeping
Sleeping
import { Grid, ThemeProvider, Typography } from "@mui/material"; | |
import { buildTheme } from "../../infrastructure/theme/theme"; | |
export default function Definition() { | |
return ( | |
<> | |
<ThemeProvider theme={buildTheme()}> | |
<Grid textAlign={"justify"} pl={2} pr={2}> | |
<Typography variant="h3" color="#000" mb={1}> | |
This is a simple web application that creates word embeddings using | |
the gensim model. Word embeddings are numerical representations of | |
words that capture their semantic meanings. Each of the 50 | |
dimensions in a GloVe model represents a latent semantic attribute, | |
and even tough they are not directly interpretable, with som sort of | |
reverse engineering the embedding space the latent semantic | |
attributes can be inferred. Two main approaches can be used to | |
achieve this: | |
</Typography> | |
<Typography variant="h3" color="#000" mb={5}> | |
<ol> | |
<li> | |
<Typography | |
variant="h3" | |
color="#000" | |
sx={{ fontWeight: "600" }} | |
> | |
Investigating Common Category Attributes | |
</Typography> | |
<Typography variant="h3" color="#000" mb={2}> | |
Investigating Common Category Attributes: By analyzing | |
dimensions with the lowest variance among a group of | |
semantically similar words.The premise is that dimensions with | |
minimal variance may be capturing attributes that are common | |
across the set. For instance: | |
<ul> | |
<li> | |
<Typography variant="h3" color="#000"> | |
<b>Words:</b> 'Spain', 'France', 'Germany', 'Japan' (all | |
countries) | |
</Typography> | |
</li> | |
<li> | |
<Typography variant="h3" color="#000"> | |
<b>Low Variance Dimensions:</b> These would | |
theoretically indicate attributes common to all | |
countries, potentially abstract notions like | |
'sovereignty', 'nationhood', or just the general | |
category of being a 'country'. | |
</Typography> | |
</li> | |
</ul> | |
</Typography> | |
</li> | |
<li> | |
<Typography | |
variant="h3" | |
color="#000" | |
sx={{ fontWeight: "600" }} | |
> | |
Investigating Specific Semantic Differences: | |
</Typography> | |
<Typography variant="h3" color="#000"> | |
By analyzing the high variance dimensions resulting from | |
subtracting one vector from another. This subtraction aims to | |
capture the core semantic differences between two entities. It | |
can become more robust if more than one pair of words is | |
selected to compare. For instance: | |
</Typography> | |
<ul> | |
<li> | |
<Typography variant="h3" color="#000"> | |
<b>Words pairs:</b> 'Man' and 'Woman', 'Uncle' and 'Aunt', | |
'Father' and 'Mother'. | |
</Typography> | |
</li> | |
<li> | |
<Typography variant="h3" color="#000"> | |
<b>Difference Vector (High Variance Dimensions)</b> : | |
These dimensions likely highlight aspects related to | |
gender differences. The highest values in this vector | |
suggest dimensions where the concept of 'man' and 'woman' | |
differ most significantly, potentially capturing | |
gender-specific traits or roles. | |
</Typography> | |
</li> | |
</ul> | |
</li> | |
</ol> | |
</Typography> | |
</Grid> | |
</ThemeProvider> | |
</> | |
); | |
} | |