Spaces:
Running
Running
Sébastien De Greef
commited on
Commit
•
e3bf489
1
Parent(s):
c6ed4d4
chore: Update colorFrom in README.md and index.qmd
Browse files- README.md +1 -1
- _quarto.yml +0 -0
- src/_quarto.yml +24 -5
- src/about.qmd +30 -8
- src/index.qmd +8 -2
- src/llms/index.qmd +11 -1
- src/llms/llms.qmd +18 -0
- src/llms/prompting.qmd +67 -0
- src/theory/activations.qmd +152 -0
- src/theory/architectures.qmd +104 -0
- src/theory/chainoftoughts.qmd +32 -0
- src/theory/layers.qmd +105 -0
- src/theory/metrics.qmd +150 -0
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
---
|
2 |
title: Quarto Template
|
3 |
emoji: 🌖
|
4 |
-
colorFrom:
|
5 |
colorTo: pink
|
6 |
sdk: docker
|
7 |
pinned: false
|
|
|
1 |
---
|
2 |
title: Quarto Template
|
3 |
emoji: 🌖
|
4 |
+
colorFrom: blue
|
5 |
colorTo: pink
|
6 |
sdk: docker
|
7 |
pinned: false
|
_quarto.yml
ADDED
File without changes
|
src/_quarto.yml
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
project:
|
2 |
type: website
|
3 |
website:
|
4 |
-
title: "
|
5 |
sidebar:
|
6 |
style: "docked"
|
7 |
search: true
|
@@ -9,13 +9,32 @@ website:
|
|
9 |
contents:
|
10 |
- section: "About"
|
11 |
contents:
|
12 |
-
- href:
|
13 |
-
text: About
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
contents:
|
|
|
|
|
|
|
|
|
16 |
- href: llms/index.qmd
|
17 |
text: "LLM'xs"
|
18 |
-
|
|
|
19 |
contents:
|
20 |
- section: "RAG Techniques"
|
21 |
contents:
|
|
|
1 |
project:
|
2 |
type: website
|
3 |
website:
|
4 |
+
title: "My AI Cookbook"
|
5 |
sidebar:
|
6 |
style: "docked"
|
7 |
search: true
|
|
|
9 |
contents:
|
10 |
- section: "About"
|
11 |
contents:
|
12 |
+
- href: about.qmd
|
13 |
+
text: About this Cookbook
|
14 |
+
|
15 |
+
- section: "Theory"
|
16 |
+
contents:
|
17 |
+
- href: theory/activations.qmd
|
18 |
+
text: "Activation Functions"
|
19 |
+
- href: theory/architectures.qmd
|
20 |
+
text: "Network Architectures"
|
21 |
+
- href: theory/layers.qmd
|
22 |
+
text: "Layer Types"
|
23 |
+
- href: theory/metrics.qmd
|
24 |
+
text: "Metric Types"
|
25 |
+
|
26 |
+
|
27 |
+
|
28 |
+
- section: "Large Language Models"
|
29 |
contents:
|
30 |
+
- href: llms/prompting.qmd
|
31 |
+
text: "Prompting"
|
32 |
+
- href: theory/chainoftoughts.qmd
|
33 |
+
text: "Chain of toughts"
|
34 |
- href: llms/index.qmd
|
35 |
text: "LLM'xs"
|
36 |
+
|
37 |
+
- section: "Retrival Augmented Generation"
|
38 |
contents:
|
39 |
- section: "RAG Techniques"
|
40 |
contents:
|
src/about.qmd
CHANGED
@@ -2,11 +2,33 @@
|
|
2 |
title: "About"
|
3 |
---
|
4 |
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
title: "About"
|
3 |
---
|
4 |
|
5 |
+
**Welcome to My AI Cookbook**
|
6 |
+
|
7 |
+
This repository is my personal collection of recipes and notebooks, documenting my journey of learning and exploring various aspects of Artificial Intelligence (AI). As a self-taught AI enthusiast, I created this cookbook to serve as a knowledge base, a "how-to" guide, and a reference point for my own projects and experiments.
|
8 |
+
|
9 |
+
**The Story Behind**
|
10 |
+
|
11 |
+
Over the past year, I've been fascinated by the rapidly evolving field of AI and its endless possibilities. To deepen my understanding and skills, I embarked on a self-learning journey, diving into various AI-related projects and topics. As I progressed, I realized the importance of documenting my learnings, successes, and failures. This cookbook is the culmination of that effort, a centralized hub where I can quickly find and revisit previous projects, takeaways, and insights.
|
12 |
+
|
13 |
+
**What You'll Find Here**
|
14 |
+
|
15 |
+
This cookbook is a living repository of my AI-related projects, experiments, and learnings. You'll find a diverse range of topics, including:
|
16 |
+
|
17 |
+
* **Recipes**: Step-by-step guides for implementing various AI concepts, models, and techniques using popular libraries and frameworks.
|
18 |
+
* **Notebooks**: Interactive Jupyter notebooks containing code, explanations, and visualizations for AI-related projects and experiments.
|
19 |
+
* **Project Write-ups**: Detailed descriptions of my projects, including goals, approaches, challenges, and outcomes.
|
20 |
+
* **Takeaways and Insights**: Key learnings, best practices, and lessons learned from my AI journey.
|
21 |
+
|
22 |
+
**Goals and Objectives**
|
23 |
+
|
24 |
+
This cookbook serves several purposes:
|
25 |
+
|
26 |
+
* **Personal Knowledge Base**: A centralized hub for my AI-related knowledge, allowing me to quickly recall and build upon previous projects and learnings.
|
27 |
+
* **Self-Learning Platform**: A platform for continuous learning, experimentation, and improvement in AI.
|
28 |
+
* **Community Sharing**: A resource for others to learn from, providing a glimpse into my AI journey and experiences.
|
29 |
+
|
30 |
+
**Stay Tuned**
|
31 |
+
|
32 |
+
As I continue to explore and learn, this cookbook will evolve, incorporating new projects, recipes, and insights. I hope you find this resource helpful, and I look forward to sharing my AI journey with you.
|
33 |
+
|
34 |
+
Best regards,
|
src/index.qmd
CHANGED
@@ -1,10 +1,16 @@
|
|
1 |
-
---
|
2 |
title: "About Quarto"
|
3 |
---
|
4 |
|
5 |
[Quarto](https://quarto.org/) is a Markdown-based documentation system that lets you write documents in Markdown or Jupyter Notebooks, and render them to a variety of formats including HTML, PDF, PowerPoint, and more.
|
6 |
You can also use Quarto to write [books](https://quarto.org/docs/books/), create [dashboards](https://quarto.org/docs/dashboards/), and embed web applications with [Observable](https://quarto.org/docs/interactive/ojs/) and [Shinylive](https://quarto.org/docs/blog/posts/2022-10-25-shinylive-extension/).
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
## Getting started with Quarto
|
9 |
|
10 |
Once you've created the space, click on the `Files` tab in the top right to take a look at the files which make up this Space.
|
|
|
1 |
+
qu---
|
2 |
title: "About Quarto"
|
3 |
---
|
4 |
|
5 |
[Quarto](https://quarto.org/) is a Markdown-based documentation system that lets you write documents in Markdown or Jupyter Notebooks, and render them to a variety of formats including HTML, PDF, PowerPoint, and more.
|
6 |
You can also use Quarto to write [books](https://quarto.org/docs/books/), create [dashboards](https://quarto.org/docs/dashboards/), and embed web applications with [Observable](https://quarto.org/docs/interactive/ojs/) and [Shinylive](https://quarto.org/docs/blog/posts/2022-10-25-shinylive-extension/).
|
7 |
+
```{mermaid}
|
8 |
+
flowchart LR
|
9 |
+
A[Hard edge] --> B(Round edge)
|
10 |
+
B --> C{Decision}
|
11 |
+
C --> D[Result one]
|
12 |
+
C --> E[Result two]
|
13 |
+
```
|
14 |
## Getting started with Quarto
|
15 |
|
16 |
Once you've created the space, click on the `Files` tab in the top right to take a look at the files which make up this Space.
|
src/llms/index.qmd
CHANGED
@@ -1,9 +1,19 @@
|
|
1 |
---
|
2 |
title: "Habits"
|
3 |
author: "John Doe"
|
4 |
-
revealjs
|
5 |
---
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
# In the morning
|
8 |
|
9 |
## Getting up
|
|
|
1 |
---
|
2 |
title: "Habits"
|
3 |
author: "John Doe"
|
4 |
+
format: revealjs
|
5 |
---
|
6 |
|
7 |
+
## Getting up
|
8 |
+
|
9 |
+
- Turn off alarm
|
10 |
+
- Get out of bed
|
11 |
+
|
12 |
+
## Going to sleep
|
13 |
+
|
14 |
+
- Get in bed
|
15 |
+
- Count sheep
|
16 |
+
|
17 |
# In the morning
|
18 |
|
19 |
## Getting up
|
src/llms/llms.qmd
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: "Habits"
|
3 |
+
author: "John Doe"
|
4 |
+
format: revealjs
|
5 |
+
---
|
6 |
+
|
7 |
+
## Getting up
|
8 |
+
|
9 |
+
- Turn off alarm
|
10 |
+
- Get out of bed
|
11 |
+
|
12 |
+
## Going to sleep
|
13 |
+
|
14 |
+
- Get in bed
|
15 |
+
- Count sheep
|
16 |
+
## Quarto
|
17 |
+
|
18 |
+
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.
|
src/llms/prompting.qmd
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Prompting LLM's
|
2 |
+
|
3 |
+
## **I. Clarity and Specificity**
|
4 |
+
|
5 |
+
1. **Be clear and concise**: Use simple, straightforward language to convey your request.
|
6 |
+
* Good sample: "Write a short story about a character who discovers a hidden treasure."
|
7 |
+
* Bad sample: "Create a narrative that revolves around an individual who stumbles upon a concealed riches repository."
|
8 |
+
2. **Define specific tasks**: Clearly outline what you want the model to do.
|
9 |
+
* Good sample: "Summarize the main points of the article in 50 words."
|
10 |
+
* Bad sample: "Do something with the article, maybe summarize it or something."
|
11 |
+
3. **Avoid ambiguity**: Use specific terms and phrases to avoid confusion.
|
12 |
+
* Good sample: "Generate a recipe for vegan chocolate cake."
|
13 |
+
* Bad sample: "Make a dessert that's healthy and yummy."
|
14 |
+
|
15 |
+
## **II. Context and Framing**
|
16 |
+
|
17 |
+
1. **Provide context**: Give the model a clear understanding of the topic, tone, and style you're aiming for.
|
18 |
+
* Good sample: "Write a humorous article about the benefits of procrastination, in the style of The Onion."
|
19 |
+
* Bad sample: "Write something funny about procrastination."
|
20 |
+
2. **Frame the task**: Use language that sets the tone and direction for the response.
|
21 |
+
* Good sample: "Imagine you're a travel blogger, write a review of a fictional restaurant in Paris."
|
22 |
+
* Bad sample: "Write a review of a restaurant."
|
23 |
+
|
24 |
+
## **III. Tone and Style**
|
25 |
+
|
26 |
+
1. **Specify tone and style**: Use adjectives to describe the tone and style you're aiming for.
|
27 |
+
* Good sample: "Write a formal, technical report on the benefits of AI in healthcare."
|
28 |
+
* Bad sample: "Write something about AI in healthcare."
|
29 |
+
2. **Use emotional cues**: Incorporate emotional language to evoke a specific tone or atmosphere.
|
30 |
+
* Good sample: "Write a heartfelt letter to a friend who's going through a tough time."
|
31 |
+
* Bad sample: "Write a letter to a friend."
|
32 |
+
|
33 |
+
## **IV. Constraints and Guidelines**
|
34 |
+
|
35 |
+
1. **Set constraints**: Provide specific guidelines on format, length, or structure.
|
36 |
+
* Good sample: "Write a sonnet about the beauty of nature, with a specific rhyme scheme and 14 lines."
|
37 |
+
* Bad sample: "Write a poem about nature."
|
38 |
+
2. **Specify formats and structures**: Use specific formats, such as lists or tables, to guide the response.
|
39 |
+
* Good sample: "Create a table comparing the features of three different smartphones."
|
40 |
+
* Bad sample: "Write something about smartphones."
|
41 |
+
|
42 |
+
## **V. Avoiding Bias and Assumptions**
|
43 |
+
|
44 |
+
1. **Avoid leading language**: Phrases that imply a specific answer or perspective can influence the model's response.
|
45 |
+
* Good sample: "What are the benefits and drawbacks of using AI in healthcare?"
|
46 |
+
* Bad sample: "Why is AI the best thing to happen to healthcare?"
|
47 |
+
2. **Use neutral language**: Avoid language that implies a particular perspective or bias.
|
48 |
+
* Good sample: "Discuss the impact of climate change on global ecosystems."
|
49 |
+
* Bad sample: "Explain why climate change is a hoax."
|
50 |
+
|
51 |
+
## **VI. Providing Examples and References**
|
52 |
+
|
53 |
+
1. **Provide examples**: Offer concrete examples to illustrate the desired output.
|
54 |
+
* Good sample: "Write a product description in the style of this example: [insert example]."
|
55 |
+
* Bad sample: "Write a product description."
|
56 |
+
2. **Reference external sources**: Include references to external sources, such as books or articles, to provide context and guidance.
|
57 |
+
* Good sample: "Summarize the main points of 'The Hitchhiker's Guide to the Galaxy' in 100 words."
|
58 |
+
* Bad sample: "Write a summary of a book."
|
59 |
+
|
60 |
+
## **VII. Feedback and Iteration**
|
61 |
+
|
62 |
+
1. **Provide feedback**: Give the model feedback on its responses to improve future output.
|
63 |
+
* Good sample: "The previous response was too formal, can you make it more conversational?"
|
64 |
+
* Bad sample: "That was bad, try again."
|
65 |
+
2. **Iterate and refine**: Refine your prompts based on the model's responses to achieve the desired outcome.
|
66 |
+
* Good sample: "Let's try rewriting the prompt to focus on a specific aspect of the topic."
|
67 |
+
* Bad sample: "Just try again, maybe it'll work this time."
|
src/theory/activations.qmd
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
## **1. Sigmoid (Logistic)**
|
3 |
+
|
4 |
+
**Formula:** σ(x) = 1 / (1 + exp(-x))
|
5 |
+
|
6 |
+
**Strengths:** Maps any real-valued number to a value between 0 and 1, making it suitable for binary classification problems.
|
7 |
+
|
8 |
+
**Weaknesses:** Saturates (i.e., output values approach 0 or 1) for large inputs, leading to vanishing gradients during backpropagation.
|
9 |
+
|
10 |
+
**Usage:** Binary classification, logistic regression.
|
11 |
+
|
12 |
+
## **2. Hyperbolic Tangent (Tanh)**
|
13 |
+
|
14 |
+
**Formula:** tanh(x) = 2 / (1 + exp(-2x)) - 1
|
15 |
+
|
16 |
+
**Strengths:** Similar to sigmoid, but maps to (-1, 1), which can be beneficial for some models.
|
17 |
+
|
18 |
+
**Weaknesses:** Also saturates, leading to vanishing gradients.
|
19 |
+
|
20 |
+
**Usage:** Similar to sigmoid, but with a larger output range.
|
21 |
+
|
22 |
+
## **3. Rectified Linear Unit (ReLU)**
|
23 |
+
|
24 |
+
**Formula:** f(x) = max(0, x)
|
25 |
+
|
26 |
+
**Strengths:** Computationally efficient, non-saturating, and easy to compute.
|
27 |
+
|
28 |
+
**Weaknesses:** Not differentiable at x=0, which can cause issues during optimization.
|
29 |
+
|
30 |
+
**Usage:** Default activation function in many deep learning frameworks, suitable for most neural networks.
|
31 |
+
|
32 |
+
## **4. Leaky ReLU**
|
33 |
+
|
34 |
+
**Formula:** f(x) = max(αx, x), where α is a small constant (e.g., 0.01)
|
35 |
+
|
36 |
+
**Strengths:** Similar to ReLU, but allows a small fraction of the input to pass through, helping with dying neurons.
|
37 |
+
|
38 |
+
**Weaknesses:** Still non-differentiable at x=0.
|
39 |
+
|
40 |
+
**Usage:** Alternative to ReLU, especially when dealing with dying neurons.
|
41 |
+
|
42 |
+
## **5. Swish**
|
43 |
+
|
44 |
+
**Formula:** f(x) = x \* g(x), where g(x) is a learned function (e.g., sigmoid or ReLU)
|
45 |
+
|
46 |
+
**Strengths:** Self-gated, adaptive, and non-saturating.
|
47 |
+
|
48 |
+
**Weaknesses:** Computationally expensive, requires additional learnable parameters.
|
49 |
+
|
50 |
+
**Usage:** Can be used in place of ReLU or other activations, but may not always outperform them.
|
51 |
+
|
52 |
+
## **6. Softmax**
|
53 |
+
|
54 |
+
**Formula:** softmax(x) = exp(x) / Σ exp(x)
|
55 |
+
|
56 |
+
**Strengths:** Normalizes output to ensure probabilities sum to 1, making it suitable for multi-class classification.
|
57 |
+
|
58 |
+
**Weaknesses:** Only suitable for output layers with multiple classes.
|
59 |
+
|
60 |
+
**Usage:** Output layer activation for multi-class classification problems.
|
61 |
+
|
62 |
+
## **7. Softsign**
|
63 |
+
|
64 |
+
**Formula:** f(x) = x / (1 + |x|)
|
65 |
+
|
66 |
+
**Strengths:** Similar to sigmoid, but with a more gradual slope.
|
67 |
+
|
68 |
+
**Weaknesses:** Not commonly used, may not provide significant benefits over sigmoid or tanh.
|
69 |
+
|
70 |
+
**Usage:** Alternative to sigmoid or tanh in certain situations.
|
71 |
+
|
72 |
+
## **8. ArcTan**
|
73 |
+
|
74 |
+
**Formula:** f(x) = arctan(x)
|
75 |
+
|
76 |
+
**Strengths:** Non-saturating, smooth, and continuous.
|
77 |
+
|
78 |
+
**Weaknesses:** Not commonly used, may not outperform other activations.
|
79 |
+
|
80 |
+
**Usage:** Experimental or niche applications.
|
81 |
+
|
82 |
+
## **9. SoftPlus**
|
83 |
+
|
84 |
+
**Formula:** f(x) = log(1 + exp(x))
|
85 |
+
|
86 |
+
**Strengths:** Smooth, continuous, and non-saturating.
|
87 |
+
|
88 |
+
**Weaknesses:** Not commonly used, may not outperform other activations.
|
89 |
+
|
90 |
+
**Usage:** Experimental or niche applications.
|
91 |
+
|
92 |
+
## **10. Gaussian Error Linear Unit (GELU)**
|
93 |
+
|
94 |
+
**Formula:** f(x) = x \* Φ(x), where Φ is the cumulative distribution function of the standard normal distribution
|
95 |
+
|
96 |
+
**Strengths:** Non-saturating, smooth, and computationally efficient.
|
97 |
+
|
98 |
+
**Weaknesses:** Not as well-studied as ReLU or other activations.
|
99 |
+
|
100 |
+
**Usage:** Alternative to ReLU, especially in Bayesian neural networks.
|
101 |
+
|
102 |
+
## **11. Mish**
|
103 |
+
|
104 |
+
**Formula:** f(x) = x \* tanh(softplus(x))
|
105 |
+
|
106 |
+
**Strengths:** Non-saturating, smooth, and computationally efficient.
|
107 |
+
|
108 |
+
**Weaknesses:** Not as well-studied as ReLU or other activations.
|
109 |
+
|
110 |
+
**Usage:** Alternative to ReLU, especially in computer vision tasks.
|
111 |
+
|
112 |
+
## **12. Silu (SiLU)**
|
113 |
+
|
114 |
+
**Formula:** f(x) = x \* sigmoid(x)
|
115 |
+
|
116 |
+
**Strengths:** Non-saturating, smooth, and computationally efficient.
|
117 |
+
|
118 |
+
**Weaknesses:** Not as well-studied as ReLU or other activations.
|
119 |
+
|
120 |
+
**Usage:** Alternative to ReLU, especially in computer vision tasks.
|
121 |
+
|
122 |
+
## **13. GELU Approximation (GELU Approx.)**
|
123 |
+
|
124 |
+
**Formula:** f(x) ≈ 0.5 \* x \* (1 + tanh(√(2/π) \* (x + 0.044715 \* x^3)))
|
125 |
+
|
126 |
+
**Strengths:** Fast, non-saturating, and smooth.
|
127 |
+
|
128 |
+
**Weaknesses:** Approximation, not exactly equal to GELU.
|
129 |
+
|
130 |
+
**Usage:** Alternative to GELU, especially when computational efficiency is crucial.
|
131 |
+
|
132 |
+
## **14. SELU (Scaled Exponential Linear Unit)**
|
133 |
+
|
134 |
+
**Formula:** f(x) = λ { x if x > 0, α(e^x - 1) if x ≤ 0 }
|
135 |
+
|
136 |
+
**Strengths:** Self-normalizing, non-saturating, and computationally efficient.
|
137 |
+
|
138 |
+
**Weaknesses:** Requires careful initialization and α tuning.
|
139 |
+
|
140 |
+
**Usage:** Alternative to ReLU, especially in deep neural networks.
|
141 |
+
|
142 |
+
When choosing an activation function, consider the following:
|
143 |
+
|
144 |
+
* **Non-saturation:** Avoid activations that saturate (e.g., sigmoid, tanh) to prevent vanishing gradients.
|
145 |
+
|
146 |
+
* **Computational efficiency:** Choose activations that are computationally efficient (e.g., ReLU, Swish) for large models or real-time applications.
|
147 |
+
|
148 |
+
* **Smoothness:** Smooth activations (e.g., GELU, Mish) can help with optimization and convergence.
|
149 |
+
|
150 |
+
* **Domain knowledge:** Select activations based on the problem domain and desired output (e.g., softmax for multi-class classification).
|
151 |
+
|
152 |
+
* **Experimentation:** Try different activations and evaluate their performance on your specific task.
|
src/theory/architectures.qmd
ADDED
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## **1. Feedforward Neural Networks (FNNs)**
|
2 |
+
|
3 |
+
* Usage: Image classification, regression, function approximation
|
4 |
+
* Description: A basic neural network architecture where data flows only in one direction, from input layer to output layer, without any feedback loops.
|
5 |
+
* Strengths: Simple to implement, computationally efficient
|
6 |
+
* Caveats: Limited capacity to model complex relationships, prone to overfitting
|
7 |
+
|
8 |
+
## **2. Convolutional Neural Networks (CNNs)**
|
9 |
+
|
10 |
+
* Usage: Image classification, object detection, image segmentation
|
11 |
+
* Description: A neural network architecture that uses convolutional and pooling layers to extract features from images.
|
12 |
+
* Strengths: Excellent performance on image-related tasks, robust to image transformations
|
13 |
+
* Caveats: Computationally expensive, require large datasets
|
14 |
+
|
15 |
+
## **3. Recurrent Neural Networks (RNNs)**
|
16 |
+
|
17 |
+
* Usage: Natural Language Processing (NLP), sequence prediction, time series forecasting
|
18 |
+
* Description: A neural network architecture that uses feedback connections to model sequential data.
|
19 |
+
* Strengths: Excellent performance on sequential data, can model long-term dependencies
|
20 |
+
* Caveats: Suffer from vanishing gradients, difficult to train
|
21 |
+
|
22 |
+
## **4. Long Short-Term Memory (LSTM) Networks**
|
23 |
+
|
24 |
+
* Usage: NLP, sequence prediction, time series forecasting
|
25 |
+
* Description: A type of RNN that uses memory cells to learn long-term dependencies.
|
26 |
+
* Strengths: Excellent performance on sequential data, can model long-term dependencies
|
27 |
+
* Caveats: Computationally expensive, require large datasets
|
28 |
+
|
29 |
+
## **5. Transformers**
|
30 |
+
|
31 |
+
* Usage: NLP, machine translation, language modeling
|
32 |
+
* Description: A neural network architecture that uses self-attention mechanisms to model relationships between input sequences.
|
33 |
+
* Strengths: Excellent performance on sequential data, parallelizable, can handle long-range dependencies
|
34 |
+
* Caveats: Computationally expensive, require large datasets
|
35 |
+
|
36 |
+
## **6. Autoencoders**
|
37 |
+
|
38 |
+
* Usage: Dimensionality reduction, anomaly detection, generative modeling
|
39 |
+
* Description: A neural network architecture that learns to compress and reconstruct input data.
|
40 |
+
* Strengths: Excellent performance on dimensionality reduction, can learn robust representations
|
41 |
+
* Caveats: May not perform well on complex data distributions
|
42 |
+
|
43 |
+
## **7. Generative Adversarial Networks (GANs)**
|
44 |
+
|
45 |
+
* Usage: Generative modeling, data augmentation, style transfer
|
46 |
+
* Description: A neural network architecture that consists of a generator and discriminator, which compete to generate realistic data.
|
47 |
+
* Strengths: Excellent performance on generative tasks, can generate realistic data
|
48 |
+
* Caveats: Training can be unstable, require careful tuning of hyperparameters
|
49 |
+
|
50 |
+
## **8. Residual Networks (ResNets)**
|
51 |
+
|
52 |
+
* Usage: Image classification, object detection
|
53 |
+
* Description: A neural network architecture that uses residual connections to ease training.
|
54 |
+
* Strengths: Excellent performance on image-related tasks, ease of training
|
55 |
+
* Caveats: May not perform well on sequential data
|
56 |
+
|
57 |
+
## **9. U-Net**
|
58 |
+
|
59 |
+
* Usage: Image segmentation, object detection
|
60 |
+
* Description: A neural network architecture that uses a encoder-decoder structure with skip connections.
|
61 |
+
* Strengths: Excellent performance on image segmentation tasks, fast training
|
62 |
+
* Caveats: May not perform well on sequential data
|
63 |
+
|
64 |
+
## **10. Attention-based Models**
|
65 |
+
|
66 |
+
* Usage: NLP, machine translation, question answering
|
67 |
+
* Description: A neural network architecture that uses attention mechanisms to focus on relevant input regions.
|
68 |
+
* Strengths: Excellent performance on sequential data, can model long-range dependencies
|
69 |
+
* Caveats: Require careful tuning of hyperparameters
|
70 |
+
|
71 |
+
## **11. Graph Neural Networks (GNNs)**
|
72 |
+
|
73 |
+
* Usage: Graph-based data, social network analysis, recommendation systems
|
74 |
+
* Description: A neural network architecture that uses graph structures to model relationships between nodes.
|
75 |
+
* Strengths: Excellent performance on graph-based data, can model complex relationships
|
76 |
+
* Caveats: Computationally expensive, require large datasets
|
77 |
+
|
78 |
+
## **12. Reinforcement Learning (RL) Architectures**
|
79 |
+
|
80 |
+
* Usage: Game playing, robotics, autonomous systems
|
81 |
+
* Description: A neural network architecture that uses reinforcement learning to learn from interactions with an environment.
|
82 |
+
* Strengths: Excellent performance on sequential decision-making tasks, can learn complex policies
|
83 |
+
* Caveats: Require large datasets, can be slow to train
|
84 |
+
|
85 |
+
## **13. Evolutionary Neural Networks**
|
86 |
+
|
87 |
+
* Usage: Neuroevolution, optimization problems
|
88 |
+
* Description: A neural network architecture that uses evolutionary principles to evolve neural networks.
|
89 |
+
* Strengths: Excellent performance on optimization problems, can learn complex policies
|
90 |
+
* Caveats: Computationally expensive, require large datasets
|
91 |
+
|
92 |
+
## **14. Spiking Neural Networks (SNNs)**
|
93 |
+
|
94 |
+
* Usage: Neuromorphic computing, edge AI
|
95 |
+
* Description: A neural network architecture that uses spiking neurons to process data.
|
96 |
+
* Strengths: Excellent performance on edge AI applications, energy-efficient
|
97 |
+
* Caveats: Limited software support, require specialized hardware
|
98 |
+
|
99 |
+
## **15. Conditional Random Fields (CRFs)**
|
100 |
+
|
101 |
+
* Usage: NLP, sequence labeling, information extraction
|
102 |
+
* Description: A probabilistic model that uses graphical models to model sequential data.
|
103 |
+
* Strengths: Excellent performance on sequential data, can model complex relationships
|
104 |
+
* Caveats: Computationally expensive, require large datasets
|
src/theory/chainoftoughts.qmd
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# **Chain of Thoughts**
|
2 |
+
|
3 |
+
The Chain of Thoughts is a powerful technique used in artificial intelligence and cognitive architectures to model human-like reasoning and decision-making. It's a method for generating a sequence of thoughts, ideas, or concepts that are linked together to form a coherent narrative or argument.
|
4 |
+
|
5 |
+
## **How it works:**
|
6 |
+
|
7 |
+
1. **Seed thought**: The process starts with a seed thought, which is an initial idea or concept.
|
8 |
+
2. **Associative thinking**: The AI system uses associative thinking to generate a new thought that is related to the seed thought.
|
9 |
+
3. **Contextualization**: The system contextualizes the new thought within the existing knowledge graph or semantic network.
|
10 |
+
4. **Inference**: The system draws inferences from the new thought, generating a new set of related ideas or concepts.
|
11 |
+
5. **Iteration**: Steps 2-4 are repeated, creating a chain of thoughts that are linked together through associations, inferences, and contextualization.
|
12 |
+
|
13 |
+
## **Benefits:**
|
14 |
+
|
15 |
+
* **Human-like reasoning**: The Chain of Thoughts technique enables AI systems to reason and think in a way that's similar to humans, making them more relatable and interactive.
|
16 |
+
* **Creative problem-solving**: By generating a chain of thoughts, AI systems can explore different solutions to complex problems, fostering creative problem-solving.
|
17 |
+
* **Natural language understanding**: The Chain of Thoughts technique can be used to improve natural language understanding by generating coherent narratives or arguments.
|
18 |
+
|
19 |
+
## **Other similar techniques:**
|
20 |
+
|
21 |
+
1. **Mind Mapping**: A visual technique used to organize and connect ideas, concepts, and information.
|
22 |
+
2. **Concept Mapping**: A method for visually representing relationships between concepts, ideas, and information.
|
23 |
+
3. **Causal Chain Analysis**: A technique used to identify cause-and-effect relationships between events or variables.
|
24 |
+
4. **Influence Diagrams**: A graphical representation of uncertain relationships between variables, used in decision analysis and Bayesian networks.
|
25 |
+
5. **Cognitive Maps**: A visual representation of an individual's thought processes, beliefs, and attitudes.
|
26 |
+
|
27 |
+
These techniques share similarities with the Chain of Thoughts in that they:
|
28 |
+
|
29 |
+
* Use associations and relationships to connect ideas and concepts
|
30 |
+
* Foster creative problem-solving and critical thinking
|
31 |
+
* Can be used to model human-like reasoning and decision-making
|
32 |
+
* Enable AI systems to generate coherent narratives or arguments
|
src/theory/layers.qmd
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
## **1. Input Layers**
|
3 |
+
|
4 |
+
* Usage: Receive input data, propagate it to subsequent layers
|
5 |
+
* Description: The first layer in a neural network that receives input data
|
6 |
+
* Strengths: Essential for processing input data, easy to implement
|
7 |
+
* Weaknesses: Limited functionality, no learning occurs in this layer
|
8 |
+
|
9 |
+
## **2. Dense Layers (Fully Connected Layers)**
|
10 |
+
|
11 |
+
* Usage: Feature extraction, classification, regression
|
12 |
+
* Description: A layer where every input is connected to every output, using a weighted sum
|
13 |
+
* Strengths: Excellent for feature extraction, easy to implement, fast computation
|
14 |
+
* Weaknesses: Can be prone to overfitting, computationally expensive for large inputs
|
15 |
+
|
16 |
+
## **3. Convolutional Layers (Conv Layers)**
|
17 |
+
|
18 |
+
* Usage: Image classification, object detection, image segmentation
|
19 |
+
* Description: A layer that applies filters to small regions of the input data, scanning the input data horizontally and vertically
|
20 |
+
* Strengths: Excellent for image processing, reduces spatial dimensions, retains spatial hierarchy
|
21 |
+
* Weaknesses: Computationally expensive, require large datasets
|
22 |
+
|
23 |
+
## **4. Pooling Layers (Downsampling Layers)**
|
24 |
+
|
25 |
+
* Usage: Image classification, object detection, image segmentation
|
26 |
+
* Description: A layer that reduces spatial dimensions by taking the maximum or average value across a region
|
27 |
+
* Strengths: Reduces spatial dimensions, reduces number of parameters, retains important features
|
28 |
+
* Weaknesses: Loses some information, can be sensitive to hyperparameters
|
29 |
+
|
30 |
+
## **5. Recurrent Layers (RNNs)**
|
31 |
+
|
32 |
+
* Usage: Natural Language Processing (NLP), sequence prediction, time series forecasting
|
33 |
+
* Description: A layer that processes sequential data, using hidden state to capture temporal dependencies
|
34 |
+
* Strengths: Excellent for sequential data, can model long-term dependencies
|
35 |
+
* Weaknesses: Suffers from vanishing gradients, difficult to train, computationally expensive
|
36 |
+
|
37 |
+
## **6. Long Short-Term Memory (LSTM) Layers**
|
38 |
+
|
39 |
+
* Usage: NLP, sequence prediction, time series forecasting
|
40 |
+
* Description: A type of RNN that uses memory cells to learn long-term dependencies
|
41 |
+
* Strengths: Excellent for sequential data, can model long-term dependencies, mitigates vanishing gradients
|
42 |
+
* Weaknesses: Computationally expensive, require large datasets
|
43 |
+
|
44 |
+
## **7. Gated Recurrent Unit (GRU) Layers**
|
45 |
+
|
46 |
+
* Usage: NLP, sequence prediction, time series forecasting
|
47 |
+
* Description: A simpler alternative to LSTM, using gates to control the flow of information
|
48 |
+
* Strengths: Faster computation, simpler than LSTM, easier to train
|
49 |
+
* Weaknesses: May not perform as well as LSTM, limited capacity to model long-term dependencies
|
50 |
+
|
51 |
+
## **8. Batch Normalization Layers**
|
52 |
+
|
53 |
+
* Usage: Normalizing inputs, stabilizing training, improving performance
|
54 |
+
* Description: A layer that normalizes inputs, reducing internal covariate shift
|
55 |
+
* Strengths: Improves training stability, accelerates training, improves performance
|
56 |
+
* Weaknesses: Requires careful tuning of hyperparameters, can be computationally expensive
|
57 |
+
|
58 |
+
## **9. Dropout Layers**
|
59 |
+
|
60 |
+
* Usage: Regularization, preventing overfitting
|
61 |
+
* Description: A layer that randomly drops out neurons during training, reducing overfitting
|
62 |
+
* Strengths: Effective regularization technique, reduces overfitting, improves generalization
|
63 |
+
* Weaknesses: Can slow down training, requires careful tuning of hyperparameters
|
64 |
+
|
65 |
+
## **10. Flatten Layers**
|
66 |
+
|
67 |
+
* Usage: Reshaping data, preparing data for dense layers
|
68 |
+
* Description: A layer that flattens input data into a one-dimensional array
|
69 |
+
* Strengths: Essential for preparing data for dense layers, easy to implement
|
70 |
+
* Weaknesses: Limited functionality, no learning occurs in this layer
|
71 |
+
|
72 |
+
## **11. Embedding Layers**
|
73 |
+
|
74 |
+
* Usage: NLP, word embeddings, language modeling
|
75 |
+
* Description: A layer that converts categorical data into dense vectors
|
76 |
+
* Strengths: Excellent for NLP tasks, reduces dimensionality, captures semantic relationships
|
77 |
+
* Weaknesses: Require large datasets, can be computationally expensive
|
78 |
+
|
79 |
+
## **12. Attention Layers**
|
80 |
+
|
81 |
+
* Usage: NLP, machine translation, question answering
|
82 |
+
* Description: A layer that computes weighted sums of input data, focusing on relevant regions
|
83 |
+
* Strengths: Excellent for sequential data, can model long-range dependencies, improves performance
|
84 |
+
* Weaknesses: Computationally expensive, require careful tuning of hyperparameters
|
85 |
+
|
86 |
+
## **13. Upsampling Layers**
|
87 |
+
|
88 |
+
* Usage: Image segmentation, object detection, image generation
|
89 |
+
* Description: A layer that increases spatial dimensions, using interpolation or learned upsampling filters
|
90 |
+
* Strengths: Excellent for image processing, improves spatial resolution, enables image generation
|
91 |
+
* Weaknesses: Computationally expensive, require careful tuning of hyperparameters
|
92 |
+
|
93 |
+
## **14. Normalization Layers**
|
94 |
+
|
95 |
+
* Usage: Normalizing inputs, stabilizing training, improving performance
|
96 |
+
* Description: A layer that normalizes inputs, reducing internal covariate shift
|
97 |
+
* Strengths: Improves training stability, accelerates training, improves performance
|
98 |
+
* Weaknesses: Requires careful tuning of hyperparameters, can be computationally expensive
|
99 |
+
|
100 |
+
## **15. Activation Functions**
|
101 |
+
|
102 |
+
* Usage: Introducing non-linearity, enhancing model capacity
|
103 |
+
* Description: A function that introduces non-linearity into the model, enabling complex representations
|
104 |
+
* Strengths: Enables complex representations, improves model capacity, enhances performance
|
105 |
+
* Weaknesses: Requires careful tuning of hyperparameters, can be computationally expensive
|
src/theory/metrics.qmd
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# **Metrics for Model Performance Monitoring and Validation**
|
2 |
+
|
3 |
+
In machine learning, it's essential to evaluate the performance of a model to ensure it's accurate, reliable, and effective. There are various metrics to measure model performance, each with its strengths and limitations. Here's an overview of popular metrics, their pros and cons, and examples of tasks that apply to each.
|
4 |
+
|
5 |
+
## **1. Mean Squared Error (MSE)**
|
6 |
+
|
7 |
+
MSE measures the average squared difference between predicted and actual values.
|
8 |
+
|
9 |
+
Pros:
|
10 |
+
|
11 |
+
* Easy to calculate
|
12 |
+
* Sensitive to outliers
|
13 |
+
|
14 |
+
Cons:
|
15 |
+
|
16 |
+
* Can be heavily influenced by extreme values
|
17 |
+
|
18 |
+
Example tasks:
|
19 |
+
|
20 |
+
* Regression tasks, such as predicting house prices or stock prices
|
21 |
+
* Time series forecasting
|
22 |
+
|
23 |
+
## **2. Mean Absolute Error (MAE)**
|
24 |
+
|
25 |
+
MAE measures the average absolute difference between predicted and actual values.
|
26 |
+
|
27 |
+
Pros:
|
28 |
+
|
29 |
+
* Robust to outliers
|
30 |
+
* Easy to interpret
|
31 |
+
|
32 |
+
Cons:
|
33 |
+
|
34 |
+
* Can be sensitive to skewness in the data
|
35 |
+
|
36 |
+
Example tasks:
|
37 |
+
|
38 |
+
* Regression tasks, such as predicting house prices or stock prices
|
39 |
+
* Time series forecasting
|
40 |
+
|
41 |
+
## **3. Mean Absolute Percentage Error (MAPE)**
|
42 |
+
|
43 |
+
MAPE measures the average absolute percentage difference between predicted and actual values.
|
44 |
+
|
45 |
+
Pros:
|
46 |
+
|
47 |
+
* Easy to interpret
|
48 |
+
* Sensitive to relative errors
|
49 |
+
|
50 |
+
Cons:
|
51 |
+
|
52 |
+
* Can be sensitive to outliers
|
53 |
+
|
54 |
+
Example tasks:
|
55 |
+
|
56 |
+
* Regression tasks, such as predicting house prices or stock prices
|
57 |
+
* Time series forecasting
|
58 |
+
|
59 |
+
## **4. R-Squared (R²)**
|
60 |
+
|
61 |
+
R² measures the proportion of variance in the dependent variable that's explained by the independent variables.
|
62 |
+
|
63 |
+
Pros:
|
64 |
+
|
65 |
+
* Easy to interpret
|
66 |
+
* Sensitive to the strength of the relationship
|
67 |
+
|
68 |
+
Cons:
|
69 |
+
|
70 |
+
* Can be sensitive to outliers
|
71 |
+
* Can be misleading for non-linear relationships
|
72 |
+
|
73 |
+
Example tasks:
|
74 |
+
|
75 |
+
* Regression tasks, such as predicting house prices or stock prices
|
76 |
+
* Feature selection
|
77 |
+
|
78 |
+
## **5. Brier Score**
|
79 |
+
|
80 |
+
The Brier Score measures the average squared difference between predicted and actual probabilities.
|
81 |
+
|
82 |
+
Pros:
|
83 |
+
|
84 |
+
* Sensitive to the quality of the predictions
|
85 |
+
* Can handle multi-class classification tasks
|
86 |
+
|
87 |
+
Cons:
|
88 |
+
|
89 |
+
* Can be sensitive to the choice of threshold
|
90 |
+
|
91 |
+
Example tasks:
|
92 |
+
|
93 |
+
* Multi-class classification tasks, such as image classification
|
94 |
+
* Multi-label classification tasks
|
95 |
+
|
96 |
+
## **6. F1 Score**
|
97 |
+
|
98 |
+
The F1 Score measures the harmonic mean of precision and recall.
|
99 |
+
|
100 |
+
Pros:
|
101 |
+
|
102 |
+
* Sensitive to the balance between precision and recall
|
103 |
+
* Can handle imbalanced datasets
|
104 |
+
|
105 |
+
Cons:
|
106 |
+
|
107 |
+
* Can be sensitive to the choice of threshold
|
108 |
+
|
109 |
+
Example tasks:
|
110 |
+
|
111 |
+
* Binary classification tasks, such as spam detection
|
112 |
+
* Multi-class classification tasks
|
113 |
+
|
114 |
+
## **7. Matthews Correlation Coefficient (MCC)**
|
115 |
+
|
116 |
+
MCC measures the correlation between predicted and actual labels.
|
117 |
+
|
118 |
+
Pros:
|
119 |
+
|
120 |
+
* Sensitive to the quality of the predictions
|
121 |
+
* Can handle imbalanced datasets
|
122 |
+
|
123 |
+
Cons:
|
124 |
+
|
125 |
+
* Can be sensitive to the choice of threshold
|
126 |
+
|
127 |
+
Example tasks:
|
128 |
+
|
129 |
+
* Binary classification tasks, such as spam detection
|
130 |
+
* Multi-class classification tasks
|
131 |
+
|
132 |
+
## **8. Log Loss**
|
133 |
+
|
134 |
+
Log Loss measures the average log loss between predicted and actual probabilities.
|
135 |
+
|
136 |
+
Pros:
|
137 |
+
|
138 |
+
* Sensitive to the quality of the predictions
|
139 |
+
* Can handle multi-class classification tasks
|
140 |
+
|
141 |
+
Cons:
|
142 |
+
|
143 |
+
* Can be sensitive to the choice of threshold
|
144 |
+
|
145 |
+
Example tasks:
|
146 |
+
|
147 |
+
* Multi-class classification tasks, such as image classification
|
148 |
+
* Multi-label classification tasks
|
149 |
+
|
150 |
+
When choosing a metric, consider the specific task, data characteristics, and desired outcome. It's essential to understand the strengths and limitations of each metric to ensure accurate model evaluation.
|