abadesalex's picture
bar graphs
1ad4e76
import { Grid, ThemeProvider, Typography } from "@mui/material";
import { buildTheme } from "../../infrastructure/theme/theme";
export default function Definition() {
return (
<>
<ThemeProvider theme={buildTheme()}>
<Grid textAlign={"justify"} pl={2} pr={2}>
<Typography variant="h3" color="#000" mb={1}>
This is a simple web application that creates word embeddings using
the gensim model. Word embeddings are numerical representations of
words that capture their semantic meanings. Each of the 50
dimensions in a GloVe model represents a latent semantic attribute,
and even tough they are not directly interpretable, with som sort of
reverse engineering the embedding space the latent semantic
attributes can be inferred. Two main approaches can be used to
achieve this:
</Typography>
<Typography variant="h3" color="#000" mb={5}>
<ol>
<li>
<Typography
variant="h3"
color="#000"
sx={{ fontWeight: "600" }}
>
Investigating Common Category Attributes
</Typography>
<Typography variant="h3" color="#000" mb={2}>
Investigating Common Category Attributes: By analyzing
dimensions with the lowest variance among a group of
semantically similar words.The premise is that dimensions with
minimal variance may be capturing attributes that are common
across the set. For instance:
<ul>
<li>
<Typography variant="h3" color="#000">
<b>Words:</b> 'Spain', 'France', 'Germany', 'Japan' (all
countries)
</Typography>
</li>
<li>
<Typography variant="h3" color="#000">
<b>Low Variance Dimensions:</b> These would
theoretically indicate attributes common to all
countries, potentially abstract notions like
'sovereignty', 'nationhood', or just the general
category of being a 'country'.
</Typography>
</li>
</ul>
</Typography>
</li>
<li>
<Typography
variant="h3"
color="#000"
sx={{ fontWeight: "600" }}
>
Investigating Specific Semantic Differences:
</Typography>
<Typography variant="h3" color="#000">
By analyzing the high variance dimensions resulting from
subtracting one vector from another. This subtraction aims to
capture the core semantic differences between two entities. It
can become more robust if more than one pair of words is
selected to compare. For instance:
</Typography>
<ul>
<li>
<Typography variant="h3" color="#000">
<b>Words pairs:</b> 'Man' and 'Woman', 'Uncle' and 'Aunt',
'Father' and 'Mother'.
</Typography>
</li>
<li>
<Typography variant="h3" color="#000">
<b>Difference Vector (High Variance Dimensions)</b> :
These dimensions likely highlight aspects related to
gender differences. The highest values in this vector
suggest dimensions where the concept of 'man' and 'woman'
differ most significantly, potentially capturing
gender-specific traits or roles.
</Typography>
</li>
</ul>
</li>
</ol>
</Typography>
</Grid>
</ThemeProvider>
</>
);
}