martivarga commited on
Commit
a19cb61
·
verified ·
1 Parent(s): ba7acea
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ rag_docs/TOSSJ-11-3.pdf filter=lfs diff=lfs merge=lfs -text
37
+ rag_docs/TOSSJ-12-1.pdf filter=lfs diff=lfs merge=lfs -text
LICENSE.txt ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Attribution 4.0 International
2
+
3
+ =======================================================================
4
+
5
+ Creative Commons Corporation ("Creative Commons") is not a law firm and
6
+ does not provide legal services or legal advice. Distribution of
7
+ Creative Commons public licenses does not create a lawyer-client or
8
+ other relationship. Creative Commons makes its licenses and related
9
+ information available on an "as-is" basis. Creative Commons gives no
10
+ warranties regarding its licenses, any material licensed under their
11
+ terms and conditions, or any related information. Creative Commons
12
+ disclaims all liability for damages resulting from their use to the
13
+ fullest extent possible.
14
+
15
+ Using Creative Commons Public Licenses
16
+
17
+ Creative Commons public licenses provide a standard set of terms and
18
+ conditions that creators and other rights holders may use to share
19
+ original works of authorship and other material subject to copyright
20
+ and certain other rights specified in the public license below. The
21
+ following considerations are for informational purposes only, are not
22
+ exhaustive, and do not form part of our licenses.
23
+
24
+ Considerations for licensors: Our public licenses are
25
+ intended for use by those authorized to give the public
26
+ permission to use material in ways otherwise restricted by
27
+ copyright and certain other rights. Our licenses are
28
+ irrevocable. Licensors should read and understand the terms
29
+ and conditions of the license they choose before applying it.
30
+ Licensors should also secure all rights necessary before
31
+ applying our licenses so that the public can reuse the
32
+ material as expected. Licensors should clearly mark any
33
+ material not subject to the license. This includes other CC-
34
+ licensed material, or material used under an exception or
35
+ limitation to copyright. More considerations for licensors:
36
+ wiki.creativecommons.org/Considerations_for_licensors
37
+
38
+ Considerations for the public: By using one of our public
39
+ licenses, a licensor grants the public permission to use the
40
+ licensed material under specified terms and conditions. If
41
+ the licensor's permission is not necessary for any reason--for
42
+ example, because of any applicable exception or limitation to
43
+ copyright--then that use is not regulated by the license. Our
44
+ licenses grant only permissions under copyright and certain
45
+ other rights that a licensor has authority to grant. Use of
46
+ the licensed material may still be restricted for other
47
+ reasons, including because others have copyright or other
48
+ rights in the material. A licensor may make special requests,
49
+ such as asking that all changes be marked or described.
50
+ Although not required by our licenses, you are encouraged to
51
+ respect those requests where reasonable. More considerations
52
+ for the public:
53
+ wiki.creativecommons.org/Considerations_for_licensees
54
+
55
+ =======================================================================
56
+
57
+ Creative Commons Attribution 4.0 International Public License
58
+
59
+ By exercising the Licensed Rights (defined below), You accept and agree
60
+ to be bound by the terms and conditions of this Creative Commons
61
+ Attribution 4.0 International Public License ("Public License"). To the
62
+ extent this Public License may be interpreted as a contract, You are
63
+ granted the Licensed Rights in consideration of Your acceptance of
64
+ these terms and conditions, and the Licensor grants You such rights in
65
+ consideration of benefits the Licensor receives from making the
66
+ Licensed Material available under these terms and conditions.
67
+
68
+
69
+ Section 1 -- Definitions.
70
+
71
+ a. Adapted Material means material subject to Copyright and Similar
72
+ Rights that is derived from or based upon the Licensed Material
73
+ and in which the Licensed Material is translated, altered,
74
+ arranged, transformed, or otherwise modified in a manner requiring
75
+ permission under the Copyright and Similar Rights held by the
76
+ Licensor. For purposes of this Public License, where the Licensed
77
+ Material is a musical work, performance, or sound recording,
78
+ Adapted Material is always produced where the Licensed Material is
79
+ synched in timed relation with a moving image.
80
+
81
+ b. Adapter's License means the license You apply to Your Copyright
82
+ and Similar Rights in Your contributions to Adapted Material in
83
+ accordance with the terms and conditions of this Public License.
84
+
85
+ c. Copyright and Similar Rights means copyright and/or similar rights
86
+ closely related to copyright including, without limitation,
87
+ performance, broadcast, sound recording, and Sui Generis Database
88
+ Rights, without regard to how the rights are labeled or
89
+ categorized. For purposes of this Public License, the rights
90
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
91
+ Rights.
92
+
93
+ d. Effective Technological Measures means those measures that, in the
94
+ absence of proper authority, may not be circumvented under laws
95
+ fulfilling obligations under Article 11 of the WIPO Copyright
96
+ Treaty adopted on December 20, 1996, and/or similar international
97
+ agreements.
98
+
99
+ e. Exceptions and Limitations means fair use, fair dealing, and/or
100
+ any other exception or limitation to Copyright and Similar Rights
101
+ that applies to Your use of the Licensed Material.
102
+
103
+ f. Licensed Material means the artistic or literary work, database,
104
+ or other material to which the Licensor applied this Public
105
+ License.
106
+
107
+ g. Licensed Rights means the rights granted to You subject to the
108
+ terms and conditions of this Public License, which are limited to
109
+ all Copyright and Similar Rights that apply to Your use of the
110
+ Licensed Material and that the Licensor has authority to license.
111
+
112
+ h. Licensor means the individual(s) or entity(ies) granting rights
113
+ under this Public License.
114
+
115
+ i. Share means to provide material to the public by any means or
116
+ process that requires permission under the Licensed Rights, such
117
+ as reproduction, public display, public performance, distribution,
118
+ dissemination, communication, or importation, and to make material
119
+ available to the public including in ways that members of the
120
+ public may access the material from a place and at a time
121
+ individually chosen by them.
122
+
123
+ j. Sui Generis Database Rights means rights other than copyright
124
+ resulting from Directive 96/9/EC of the European Parliament and of
125
+ the Council of 11 March 1996 on the legal protection of databases,
126
+ as amended and/or succeeded, as well as other essentially
127
+ equivalent rights anywhere in the world.
128
+
129
+ k. You means the individual or entity exercising the Licensed Rights
130
+ under this Public License. Your has a corresponding meaning.
131
+
132
+
133
+ Section 2 -- Scope.
134
+
135
+ a. License grant.
136
+
137
+ 1. Subject to the terms and conditions of this Public License,
138
+ the Licensor hereby grants You a worldwide, royalty-free,
139
+ non-sublicensable, non-exclusive, irrevocable license to
140
+ exercise the Licensed Rights in the Licensed Material to:
141
+
142
+ a. reproduce and Share the Licensed Material, in whole or
143
+ in part; and
144
+
145
+ b. produce, reproduce, and Share Adapted Material.
146
+
147
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
148
+ Exceptions and Limitations apply to Your use, this Public
149
+ License does not apply, and You do not need to comply with
150
+ its terms and conditions.
151
+
152
+ 3. Term. The term of this Public License is specified in Section
153
+ 6(a).
154
+
155
+ 4. Media and formats; technical modifications allowed. The
156
+ Licensor authorizes You to exercise the Licensed Rights in
157
+ all media and formats whether now known or hereafter created,
158
+ and to make technical modifications necessary to do so. The
159
+ Licensor waives and/or agrees not to assert any right or
160
+ authority to forbid You from making technical modifications
161
+ necessary to exercise the Licensed Rights, including
162
+ technical modifications necessary to circumvent Effective
163
+ Technological Measures. For purposes of this Public License,
164
+ simply making modifications authorized by this Section 2(a)
165
+ (4) never produces Adapted Material.
166
+
167
+ 5. Downstream recipients.
168
+
169
+ a. Offer from the Licensor -- Licensed Material. Every
170
+ recipient of the Licensed Material automatically
171
+ receives an offer from the Licensor to exercise the
172
+ Licensed Rights under the terms and conditions of this
173
+ Public License.
174
+
175
+ b. No downstream restrictions. You may not offer or impose
176
+ any additional or different terms or conditions on, or
177
+ apply any Effective Technological Measures to, the
178
+ Licensed Material if doing so restricts exercise of the
179
+ Licensed Rights by any recipient of the Licensed
180
+ Material.
181
+
182
+ 6. No endorsement. Nothing in this Public License constitutes or
183
+ may be construed as permission to assert or imply that You
184
+ are, or that Your use of the Licensed Material is, connected
185
+ with, or sponsored, endorsed, or granted official status by,
186
+ the Licensor or others designated to receive attribution as
187
+ provided in Section 3(a)(1)(A)(i).
188
+
189
+ b. Other rights.
190
+
191
+ 1. Moral rights, such as the right of integrity, are not
192
+ licensed under this Public License, nor are publicity,
193
+ privacy, and/or other similar personality rights; however, to
194
+ the extent possible, the Licensor waives and/or agrees not to
195
+ assert any such rights held by the Licensor to the limited
196
+ extent necessary to allow You to exercise the Licensed
197
+ Rights, but not otherwise.
198
+
199
+ 2. Patent and trademark rights are not licensed under this
200
+ Public License.
201
+
202
+ 3. To the extent possible, the Licensor waives any right to
203
+ collect royalties from You for the exercise of the Licensed
204
+ Rights, whether directly or through a collecting society
205
+ under any voluntary or waivable statutory or compulsory
206
+ licensing scheme. In all other cases the Licensor expressly
207
+ reserves any right to collect such royalties.
208
+
209
+
210
+ Section 3 -- License Conditions.
211
+
212
+ Your exercise of the Licensed Rights is expressly made subject to the
213
+ following conditions.
214
+
215
+ a. Attribution.
216
+
217
+ 1. If You Share the Licensed Material (including in modified
218
+ form), You must:
219
+
220
+ a. retain the following if it is supplied by the Licensor
221
+ with the Licensed Material:
222
+
223
+ i. identification of the creator(s) of the Licensed
224
+ Material and any others designated to receive
225
+ attribution, in any reasonable manner requested by
226
+ the Licensor (including by pseudonym if
227
+ designated);
228
+
229
+ ii. a copyright notice;
230
+
231
+ iii. a notice that refers to this Public License;
232
+
233
+ iv. a notice that refers to the disclaimer of
234
+ warranties;
235
+
236
+ v. a URI or hyperlink to the Licensed Material to the
237
+ extent reasonably practicable;
238
+
239
+ b. indicate if You modified the Licensed Material and
240
+ retain an indication of any previous modifications; and
241
+
242
+ c. indicate the Licensed Material is licensed under this
243
+ Public License, and include the text of, or the URI or
244
+ hyperlink to, this Public License.
245
+
246
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
247
+ reasonable manner based on the medium, means, and context in
248
+ which You Share the Licensed Material. For example, it may be
249
+ reasonable to satisfy the conditions by providing a URI or
250
+ hyperlink to a resource that includes the required
251
+ information.
252
+
253
+ 3. If requested by the Licensor, You must remove any of the
254
+ information required by Section 3(a)(1)(A) to the extent
255
+ reasonably practicable.
256
+
257
+ 4. If You Share Adapted Material You produce, the Adapter's
258
+ License You apply must not prevent recipients of the Adapted
259
+ Material from complying with this Public License.
260
+
261
+
262
+ Section 4 -- Sui Generis Database Rights.
263
+
264
+ Where the Licensed Rights include Sui Generis Database Rights that
265
+ apply to Your use of the Licensed Material:
266
+
267
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268
+ to extract, reuse, reproduce, and Share all or a substantial
269
+ portion of the contents of the database;
270
+
271
+ b. if You include all or a substantial portion of the database
272
+ contents in a database in which You have Sui Generis Database
273
+ Rights, then the database in which You have Sui Generis Database
274
+ Rights (but not its individual contents) is Adapted Material; and
275
+
276
+ c. You must comply with the conditions in Section 3(a) if You Share
277
+ all or a substantial portion of the contents of the database.
278
+
279
+ For the avoidance of doubt, this Section 4 supplements and does not
280
+ replace Your obligations under this Public License where the Licensed
281
+ Rights include other Copyright and Similar Rights.
282
+
283
+
284
+ Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285
+
286
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296
+
297
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306
+
307
+ c. The disclaimer of warranties and limitation of liability provided
308
+ above shall be interpreted in a manner that, to the extent
309
+ possible, most closely approximates an absolute disclaimer and
310
+ waiver of all liability.
311
+
312
+
313
+ Section 6 -- Term and Termination.
314
+
315
+ a. This Public License applies for the term of the Copyright and
316
+ Similar Rights licensed here. However, if You fail to comply with
317
+ this Public License, then Your rights under this Public License
318
+ terminate automatically.
319
+
320
+ b. Where Your right to use the Licensed Material has terminated under
321
+ Section 6(a), it reinstates:
322
+
323
+ 1. automatically as of the date the violation is cured, provided
324
+ it is cured within 30 days of Your discovery of the
325
+ violation; or
326
+
327
+ 2. upon express reinstatement by the Licensor.
328
+
329
+ For the avoidance of doubt, this Section 6(b) does not affect any
330
+ right the Licensor may have to seek remedies for Your violations
331
+ of this Public License.
332
+
333
+ c. For the avoidance of doubt, the Licensor may also offer the
334
+ Licensed Material under separate terms or conditions or stop
335
+ distributing the Licensed Material at any time; however, doing so
336
+ will not terminate this Public License.
337
+
338
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339
+ License.
340
+
341
+
342
+ Section 7 -- Other Terms and Conditions.
343
+
344
+ a. The Licensor shall not be bound by any additional or different
345
+ terms or conditions communicated by You unless expressly agreed.
346
+
347
+ b. Any arrangements, understandings, or agreements regarding the
348
+ Licensed Material not stated herein are separate from and
349
+ independent of the terms and conditions of this Public License.
350
+
351
+
352
+ Section 8 -- Interpretation.
353
+
354
+ a. For the avoidance of doubt, this Public License does not, and
355
+ shall not be interpreted to, reduce, limit, restrict, or impose
356
+ conditions on any use of the Licensed Material that could lawfully
357
+ be made without permission under this Public License.
358
+
359
+ b. To the extent possible, if any provision of this Public License is
360
+ deemed unenforceable, it shall be automatically reformed to the
361
+ minimum extent necessary to make it enforceable. If the provision
362
+ cannot be reformed, it shall be severed from this Public License
363
+ without affecting the enforceability of the remaining terms and
364
+ conditions.
365
+
366
+ c. No term or condition of this Public License will be waived and no
367
+ failure to comply consented to unless expressly agreed to by the
368
+ Licensor.
369
+
370
+ d. Nothing in this Public License constitutes or may be interpreted
371
+ as a limitation upon, or waiver of, any privileges and immunities
372
+ that apply to the Licensor or You, including from the legal
373
+ processes of any jurisdiction or authority.
374
+
375
+
376
+ =======================================================================
377
+
378
+ Creative Commons is not a party to its public
379
+ licenses. Notwithstanding, Creative Commons may elect to apply one of
380
+ its public licenses to material it publishes and in those instances
381
+ will be considered the “Licensor.” The text of the Creative Commons
382
+ public licenses is dedicated to the public domain under the CC0 Public
383
+ Domain Dedication. Except for the limited purpose of indicating that
384
+ material is shared under a Creative Commons public license or as
385
+ otherwise permitted by the Creative Commons policies published at
386
+ creativecommons.org/policies, Creative Commons does not authorize the
387
+ use of the trademark "Creative Commons" or any other trademark or logo
388
+ of Creative Commons without its prior written consent including,
389
+ without limitation, in connection with any unauthorized modifications
390
+ to any of its public licenses or any other arrangements,
391
+ understandings, or agreements concerning use of licensed material. For
392
+ the avoidance of doubt, this paragraph does not form part of the
393
+ public licenses.
394
+
395
+ Creative Commons may be contacted at creativecommons.org.
README.md CHANGED
@@ -1,13 +1,152 @@
1
- ---
2
- title: Chatbot
3
- emoji: 📈
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.44.1
8
- app_file: app.py
9
- pinned: false
10
- license: cc-by-4.0
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <<<<<<< HEAD
2
+ # Agentic RAG Sport Advisor Chatbot
3
+
4
+ This project is a prototype of an **Agentic RAG (Retrieval-Augmented Generation) chatbot**. The application demonstrates core agentic principles by autonomously deciding which tools to use to answer user queries, integrating a RAG pipeline to access document knowledge sources, and utilizing a public knowledge base via Google Search.
5
+
6
+ ---
7
+
8
+ ## Core Concepts
9
+
10
+ ### Agentic Behavior
11
+ The chatbot acts as an autonomous agent. Instead of simply generating a response, it first analyzes the user's intent and decides on the most appropriate action. This includes choosing from a set of specialized tools or a general conversation flow.
12
+
13
+ ### Retrieval-Augmented Generation (RAG)
14
+ For questions requiring specific, factual information, the agent can retrieve relevant documents from a local vector database and use that information to formulate a grounded response.
15
+
16
+ ### Tool Use
17
+ The system is equipped with several tools that allow it to perform specific, pre-defined tasks, such as looking up structured data or performing an external search.
18
+
19
+ ---
20
+
21
+ ## Architecture
22
+
23
+ The application is built using the **LangGraph framework**, which provides a structured way to define the agent's behavior as a state machine. The graph's state is maintained by a `SportAdvicerState` object that stores the conversation history.
24
+
25
+ **Key components of the LangGraph pipeline:**
26
+
27
+ - **chatbot Node:**
28
+ The entry point and main logic node. It receives a user's query, determines the user's intent (equipment, skills, document, sports_list, or general), and routes the request to the appropriate tool or invokes the LLM directly for general questions.
29
+
30
+ - **tools Node:**
31
+ Executes the specific tool chosen by the chatbot node.
32
+
33
+ - **Conditional Edges:**
34
+ Routes the flow from the chatbot node to either the tools node (if a specific tool is needed) or terminates the graph (`__end__`) after the response has been generated.
35
+
36
+ ---
37
+
38
+ ## System Components
39
+
40
+ ### RAG Pipeline (`rag_pipeline.py`)
41
+ Responsible for setting up the document retrieval system:
42
+
43
+ - Loads PDF documents from `rag_docs/` using `DirectoryLoader`.
44
+ - Splits documents into manageable chunks via `RecursiveCharacterTextSplitter`.
45
+ - Uses **ChromaDB** as the local vector store for document chunks and embeddings.
46
+ - Generates vector representations with `GoogleGenerativeAIEmbeddings`.
47
+ - Checks for an existing Chroma database to avoid re-processing documents on every run.
48
+
49
+ ### Chatbot Logic and Tools (`chatbot_nodes.py`)
50
+ Defines the agent's behavior and tools:
51
+
52
+ - **LLM:** Uses `gemini-2.5-flash-lite`, a lightweight free-to-use model from Google's Gemini family.
53
+
54
+ **Intent and Parameter Extraction:**
55
+ - `detect_intent(query)`: Classifies user queries into a specific category.
56
+ - `extract_sport_name(query)`: Extracts the relevant sport name for tools.
57
+
58
+ **Tools:**
59
+ - `get_sports()`: Returns a list of sports from a local CSV.
60
+ - `get_skills_by_sport(sport: str)`: Retrieves the top 3 highest-rated skills for a sport.
61
+ - `get_document_answer(query: str)`: Core RAG tool using a retriever to find relevant document chunks and augment LLM prompts.
62
+ - `get_equipment_by_sport(sport: str)`: Performs a Google Search to find sport equipment information.
63
+
64
+ ---
65
+
66
+ ## How to Run the Application
67
+
68
+ ### Prerequisites
69
+ - Python 3.9+
70
+ - pip
71
+ - Google API Key
72
+
73
+ ### Setup
74
+ 1. Clone the repository and navigate to the project directory.
75
+ 2. Install dependencies:
76
+
77
+ ```bash
78
+ pip install -r requirements.txt
79
+ ```
80
+ 3. Place your PDF documents in the rag_docs/ directory.
81
+ 4. Create a .env file in the root directory and add your Google API key:
82
+
83
+ ```bash
84
+ GOOGLE_API_KEY="your_api_key_here"
85
+ ```
86
+ ### Execution
87
+ Run the main application:
88
+ ```bash
89
+ python chatbot_nodes.py
90
+ ```
91
+ The application will start a Gradio web interface, accessible via your browser to interact with the chatbot.
92
+
93
+ ---
94
+
95
+ ## Evaluation and Future Work
96
+
97
+ ### Bottlenecks and Performance
98
+ - Multiple LLM calls for intent detection and sport name extraction may introduce latency.
99
+ - RAG pipeline performance depends on document quality and chunk_size / chunk_overlap parameters.
100
+
101
+ ### Suggested Improvements
102
+ - Advanced Tool Calling: Use a single LLM call to decide which tool to call with what parameters.
103
+ - Performance Benchmarking: Test queries to measure end-to-end latency and response accuracy.
104
+ - New Tools: Extend capabilities with weather updates, sports news, or game schedules.
105
+
106
+ ---
107
+
108
+ ## Example Human Questions to Tools Mapping
109
+ This section provides examples of user questions that would trigger the various tools within the chatbot.
110
+
111
+ - `get_sports()`: "Recommend me sports."
112
+
113
+ - `get_skills_by_sport(sport: str)`: "What skills are needed for football?"
114
+
115
+ - `get_document_answer(query: str)`: "How can I be successful in football based on documentations?"
116
+
117
+ - `get_equipment_by_sport(sport: str)`: "What gears are needed for football?"
118
+
119
+ ---
120
+
121
+ ## References and Citations
122
+
123
+ **RAG Input References:**
124
+ - **Dataset license**: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
125
+ - `rag_docs/TOSSJ-11-3.pdf`:
126
+
127
+ Lepschy H, Wäsche H, Woll A. How to be Successful in Football: A Systematic Review . Open Sports Sci J, 2018; 11: . http://dx.doi.org/10.2174/1875399X01811010003
128
+
129
+ - `rag_docs/TOSSJ-12-1.pdf`:
130
+
131
+ Mack M, Bryan M, Heyer G, Heinen T. Modeling Judges’ Scores in Artistic Gymnastics . Open Sports Sci J, 2019; 12: . http://dx.doi.org/10.2174/1875399X01912010001
132
+
133
+ **Sport tool input csv file references:**
134
+ - **Dataset license**: CC0: Public Domain
135
+ - `sport_tool_docs/toughestsport.csv`:
136
+
137
+ [Ranking sports by skill requirement](https://www.kaggle.com/datasets/jainaru/ranking-sports-by-skill-requirement)
138
+ =======
139
+ ---
140
+ title: Chatbot
141
+ emoji: 📈
142
+ colorFrom: indigo
143
+ colorTo: indigo
144
+ sdk: gradio
145
+ sdk_version: 5.44.1
146
+ app_file: app.py
147
+ pinned: false
148
+ license: cc-by-4.0
149
+ ---
150
+
151
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
152
+ >>>>>>> ba7aceaa754482aade1de886dd9e017c9b5997d3
rag_docs/.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ *.pdf filter=lfs diff=lfs merge=lfs -text
rag_docs/TOSSJ-11-3.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71cb3ce8b4a047bcb8890e7bbc5728bf3d107e739d436811544453482b23e460
3
+ size 563702
rag_docs/TOSSJ-12-1.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:818e2b9a4c0203a244c96507019ffbb549e07062ae6f35d63f6ffd6ad352df78
3
+ size 479367
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ python-dotenv
2
+ langgraph
3
+ langchain
4
+ langchain_core
5
+ langchain_community
6
+ langchain_google_genai
7
+ langchain_text_splitters
8
+ chromadb
9
+ google-genai
10
+ pandas
11
+ gradio
12
+ typing-extensions
13
+ protobuf
14
+ unstructured
15
+ unstructured[pdf]
sport_tool_docs/toughestsport.csv ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SPORT,Endurance,Strength,Power,Speed,Agility,Flexibility,Nerve,Durability,Hand-eye coordination,Analytical Aptitude,Total,Rank
2
+ Boxing,8.63,8.13,8.63,6.38,6.25,4.38,8.88,8.5,7,5.63,72.375,1
3
+ Ice Hockey,7.25,7.13,7.88,7.75,7.63,4.88,6,8.25,7.5,7.5,71.75,2
4
+ Football,5.38,8.63,8.13,7.13,6.38,4.38,7.25,8.5,5.5,7.13,68.375,3
5
+ Basketball,7.38,6.25,6.5,7.25,8.13,5.63,4.13,7.75,7.5,7.38,67.875,4
6
+ Wrestling,6.63,8.38,7.13,5.13,6.38,7.5,5,6.75,4.25,6.38,63.5,5
7
+ Martial Arts,5,5.88,7.75,6.38,6,7,6.63,5.88,6,6.88,63.375,6
8
+ Tennis,7.25,5.13,7.13,6.75,7.75,5.63,3,5,8.38,6.75,62.75,7
9
+ Gymnastics,5.38,6.13,6.63,5,6.38,10,7.5,6.88,4.5,4.13,62.5,8
10
+ Baseball/Softball,4.63,5.75,7.63,6.5,6.75,4.75,5.13,5.63,9.25,6.25,62.25,9
11
+ Soccer,7.75,4.5,5.13,7.25,8.25,4.75,3.63,6.25,6.5,7.5,61.5,10
12
+ Skiing: Alpine,5.13,5.25,6,7.38,6.13,5.63,8.38,6,5.13,5.63,60.625,11
13
+ Water Polo,7.88,6.63,6.88,5.38,6.38,5,4.25,6.38,6.25,5.63,60.625,11
14
+ Rugby,6.75,7,6.38,5.88,6,4.13,6.5,7.88,4.38,5.63,60.5,13
15
+ Lacrosse,6.63,5.13,5.75,7,6.63,4.75,4.38,6.13,7.13,6.88,60.375,14
16
+ Rodeo: Steer Wrestling,4,7,7.88,3.88,4.88,5,7.88,6.88,5.13,4,56.5,15
17
+ Track and Field: Pole Vault,3.38,6.88,7.25,6.13,5.38,7,6.63,4.25,5.25,3.75,55.875,16
18
+ Field Hockey,6.75,4.5,5.38,6,5.75,4.63,3.75,5,6.63,6.5,54.875,17
19
+ Speed Skating,7.63,7.25,7.38,8.88,4,4.25,4.5,4.63,2.88,3.5,54.875,17
20
+ Figure Skating,6.38,5.25,6.63,5.13,6.88,8.25,4.88,4,3.13,4.25,54.75,19
21
+ Cycling: Distance,9.63,6.38,6.25,5.13,3.75,2.63,5.88,6.88,3,4.88,54.375,20
22
+ Volleyball,5.13,4.88,6.63,5,7,5.13,2.88,4.63,7.25,5.88,54.375,20
23
+ Racquetball/Squash,6.13,3.75,5,5.5,7.25,5.88,2.38,2.88,8.38,6.5,53.625,22
24
+ Surfing,4.63,5,4.13,4.25,6.63,5.5,8.25,5.5,4.38,4.88,53.125,23
25
+ Fencing,4.63,3.75,4.25,5.13,6.13,5.63,4.88,4.25,7.25,6.88,52.75,24
26
+ Skiing: Freestyle,4.13,5.13,4.88,5.13,6.63,6.88,6.63,5.13,4.13,3.88,52.5,25
27
+ Team Handball,4.88,3.88,5.38,5.5,6,4.5,3,3.88,7.88,5.88,50.75,26
28
+ Cycling: Sprints,4.25,6.13,7.88,7.5,4,2.88,4.75,4.5,3.63,4.5,50,27
29
+ Bobsledding/Luge,3.5,5.5,6.5,6.75,4.13,3.25,7.75,3.5,4.13,4.25,49.25,28
30
+ Ski Jumping,3.5,4.5,5.75,4.63,4,5,9,4.63,4.38,3.5,48.875,29
31
+ Badminton,5.25,3.25,4,5.63,7.38,5.25,1.25,2.63,7.25,6.13,48,30
32
+ Skiing: Nordic,9,5.75,4.38,5.13,4,4,2.75,5.5,3.63,3.88,48,30
33
+ Auto Racing,5.88,3.5,2.63,1.63,2.75,1.75,9.88,4.38,8,7.5,47.875,32
34
+ Track and Field: High Jump,3,6,7,6.13,5.63,6.63,3.5,3.5,3.5,2.88,47.75,33
35
+ "Track and Field: Long, Triple jumps",4,5.63,7.13,6.75,5,5.75,2.75,3.25,4,3.13,47.375,34
36
+ Diving,2.88,5.13,4.63,3,3.5,8.5,8.38,5,3,3,47,35
37
+ Swimming (all strokes): Distance,9.25,5.25,4.63,5.5,3.63,5.5,2.63,4.63,2.88,3,46.875,36
38
+ Skateboarding,4.13,3.75,3.75,4.13,6.13,5.13,6.5,5.25,4.88,3.13,46.75,37
39
+ Track and Field: Sprints,3.5,5.13,7.25,9.88,4.63,5.13,2,4.13,2.63,2.38,46.625,38
40
+ Rowing,8.13,7.75,7.13,4,2.5,4,1.75,4.38,2.88,3.63,46.125,39
41
+ Rodeo: Calf Roping,3.13,5.38,5,4.25,5.63,3.88,4.88,3.75,6.38,3.75,46,40
42
+ Track and Field: Distance,9.63,5.25,3.75,6,3.25,4.38,2,5.75,1.88,4.13,46,40
43
+ Rodeo: Bull/Bareback/Bronc Riding,3.25,5.38,4,1.75,3.63,4.25,9.5,7.38,3.63,3.13,45.875,42
44
+ Track and Field: Middle Distance,6,5.13,5.13,7.75,4,4.88,2,4.75,2.13,3.75,45.5,43
45
+ Weight-Lifting,4.13,9.25,9.75,2.63,2.5,3.38,4,4.75,2.25,2.38,45,44
46
+ Swimming (all strokes): Sprints,4.13,5.25,6.25,7.88,3.63,5.5,2.5,3.25,2.75,3,44.125,45
47
+ Water Skiing,4.63,5,4.5,3,4.25,4.75,5.88,4.63,4.13,3.25,44,46
48
+ Table Tennis,3.5,2.5,4.63,4.13,5.88,4.25,1.38,1.88,8.88,6,43,47
49
+ Track and Field: Weights,3.25,7.88,9.13,3,3.13,3,2.25,3.63,4,2.88,42.125,48
50
+ Canoe/Kayak,6.75,5.25,5.63,3.5,2.75,3.88,3.63,3.25,3.13,4.25,42,49
51
+ Horse Racing,4,3.88,2.88,1.38,2.88,3.75,8,4.5,3.88,6.5,41.625,50
52
+ Golf,3.25,3.88,6.13,1.63,1.75,4,2.5,2.38,6,6.38,37.875,51
53
+ Cheerleading,3.63,3.63,3.38,2.25,4.13,7.5,3.63,3.38,2.5,2.25,36.25,52
54
+ Roller Skating,4.75,3.38,4,5.13,4,3.5,2.63,3.38,2.88,2.63,36.25,52
55
+ Equestrian,3.38,3.25,1.75,1.25,2.5,2.88,6,2.75,2.88,5.13,31.75,54
56
+ Archery,2.88,4.5,3.13,1.13,1.63,2.63,2.75,2.13,6.63,3.25,30.625,55
57
+ Curling,2.25,2.63,2.5,1.5,2.25,2.63,1.75,1.5,4.88,5.63,27.5,56
58
+ Bowling,2.25,2.75,3.38,1,1.88,2.38,1.63,1.25,4.75,4.13,25.375,57
59
+ Shooting,2.25,2.5,1.38,0.88,1.13,1.75,2.38,1.88,6.75,4,24.875,58
60
+ Billiards,1,1,1.75,0.75,1,2.63,1.63,0.75,5.25,5.75,21.5,59
61
+ Fishing,1.38,1.63,1.25,0.63,1.5,1.13,0.88,0.88,2.38,2.88,14.5,60
src/chatbot_nodes.py ADDED
@@ -0,0 +1,331 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os
3
+ from pathlib import Path
4
+ import traceback
5
+ from typing import Annotated, List, Union
6
+ from typing_extensions import TypedDict
7
+
8
+ import pandas as pd
9
+ import gradio as gr
10
+
11
+ from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
12
+ from langchain_core.tools import tool
13
+ from langchain_google_genai import ChatGoogleGenerativeAI
14
+ from langgraph.graph import StateGraph
15
+ from langgraph.prebuilt import ToolNode
16
+
17
+ from rag_pipeline import load_or_create_vector_store
18
+
19
+ from google import genai
20
+ from google.genai import types
21
+ from google.api_core import retry
22
+
23
+ from dotenv import load_dotenv
24
+
25
+ # ---------------------------
26
+ # --- Setup Google API ---
27
+ # ---------------------------
28
+ # Modify this load_dotenv in the future
29
+ load_dotenv(dotenv_path=os.path.join(os.path.dirname(os.path.dirname(__file__)), ".env"))
30
+ GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
31
+
32
+ if not GOOGLE_API_KEY:
33
+ raise ValueError("GOOGLE_API_KEY not found in environment variables.")
34
+
35
+ client = genai.Client(api_key=GOOGLE_API_KEY)
36
+
37
+ # Retry policy
38
+ is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})
39
+ if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
40
+ genai.models.Models.generate_content = retry.Retry(predicate=is_retriable)(genai.models.Models.generate_content)
41
+
42
+ # ---------------------------
43
+ # --- Config ---
44
+ # ---------------------------
45
+ model_name = "gemini-2.5-flash-lite" #"gemini-2.0-flash-lite"#"gemini-2.0-flash"
46
+ base_dir = Path(__file__).resolve().parent.parent
47
+ doc2_path = str( base_dir / "sport_tool_docs/toughestsport.csv")
48
+ search_kwargs_k = 5
49
+ search_kwargs_fetch_k = 10
50
+
51
+ # ---------------------------
52
+ # --- RAG Setup ---
53
+ # ---------------------------
54
+ vector_store = load_or_create_vector_store()
55
+
56
+ retriever = vector_store.as_retriever(
57
+ search_type="mmr",
58
+ search_kwargs={
59
+ "k": search_kwargs_k,
60
+ "fetch_k": search_kwargs_fetch_k
61
+ }
62
+ )
63
+
64
+ # ---------------------------
65
+ # --- Sports Data Setup ---
66
+ # ---------------------------
67
+ sports_by_skills = pd.read_csv(doc2_path)
68
+ sports_by_skills.columns = sports_by_skills.columns.str.lower()
69
+ sports_by_skills['sport'] = sports_by_skills['sport'].str.lower()
70
+
71
+ # ---------------------------
72
+ # --- LangGraph Tools ---
73
+ # ---------------------------
74
+ @tool
75
+ def get_sports() -> str:
76
+ """Return a structured list of sports from the dataset."""
77
+
78
+ prompt = """Parse the provided sports into a structured list where each line has indentation and starts with a category, followed by a colon,
79
+ and then a comma-separated list of sports within that category.
80
+ If a sport has no obvious category, group it under "General"
81
+
82
+ EXAMPLE:
83
+ Provide me sport options
84
+
85
+ Answer:
86
+ - Ball games:
87
+ - Football, Baskettball
88
+ - Skiing:
89
+ - Alpine, Nordic
90
+ - General:
91
+ - Boxing, Water polo
92
+
93
+ """
94
+ sports = sports_by_skills["sport"].tolist()
95
+ response = client.models.generate_content(model=model_name, contents=[prompt, sports])
96
+ return response.candidates[0].content.parts[0].text.strip()
97
+
98
+ @tool
99
+ def get_document_answer(query: str) -> str:
100
+ """Retrieve an answer from documents with a grounded paraphrase."""
101
+
102
+ try:
103
+ results = retriever.invoke(query)
104
+
105
+ if not results:
106
+ return "I could not find any relevant information in the documents."
107
+
108
+ # Combine the retrieved chunks for context
109
+ combined_text = "\n---\n".join([r.page_content for r in results])
110
+
111
+ prompt = f"""
112
+ Answer the question based on the following documents.
113
+ If the information is not available, state that you cannot find the answer in the provided documents.
114
+
115
+ Chunks:
116
+ {combined_text}
117
+
118
+ Question: {query}
119
+ Answer:
120
+ """
121
+
122
+ # Call the LLM
123
+ response = client.models.generate_content(
124
+ model=model_name,
125
+ contents=[prompt]
126
+ )
127
+
128
+ if response.candidates:
129
+ answer_text = response.candidates[0].content.parts[0].text.strip()
130
+ return answer_text
131
+ else:
132
+ return "I could not generate an answer."
133
+
134
+ except Exception as e:
135
+ return f"RAG error: {e}"
136
+
137
+ @tool
138
+ def get_skills_by_sport(sport: str) -> str:
139
+ """Get the sport name. Return: The top 3 highest skill rates."""
140
+ sport = sport.lower().strip()
141
+ skill_rates = sports_by_skills.loc[sports_by_skills['sport'] == sport]
142
+
143
+ if skill_rates.empty:
144
+ return f"No data found for sport '{sport}'. Please check the spelling or try another sport."
145
+
146
+ skills_only = skill_rates.drop(columns=['sport', 'total', 'rank'])
147
+ transposed = skills_only.T
148
+ col = transposed.columns[0]
149
+ top_3_skills = transposed.nlargest(3, col)
150
+ top_3_skill_names = "\n".join(f"{skill}" for skill, value in top_3_skills[col].items())
151
+
152
+ return f"Top 3 skills for {sport.capitalize()}:\n{top_3_skill_names}"
153
+
154
+ @tool
155
+ def get_equipment_by_sport(sport: str) -> str:
156
+ """Get the equipment list for a sport using a google search grounded prompt."""
157
+
158
+ sport = sport.lower()
159
+ prompt = """Parse a customer's sport equipment question to the list:
160
+ EXAMPLE: What are the necessary equipment for boxing?
161
+ Response:
162
+ - Mandatory: 1 gloves, 3 socks
163
+ - Recommended: 1 towel
164
+ - Fun: resistance bands
165
+ """
166
+
167
+ config_with_search = types.GenerateContentConfig(
168
+ tools=[types.Tool(google_search=types.GoogleSearch())],
169
+ temperature=0.0,
170
+ )
171
+
172
+ contents_text = "What are the necessary equipment for this " + sport + "?"
173
+ response = client.models.generate_content(
174
+ model=model_name,
175
+ contents=[prompt, contents_text],
176
+ config=config_with_search,
177
+ )
178
+ return response.candidates[0].content.parts[0].text if response.candidates else "No information found."
179
+
180
+ # ---------------------------
181
+ # --- Tool Node ---
182
+ # ---------------------------
183
+ tools_list = [get_sports, get_document_answer, get_skills_by_sport, get_equipment_by_sport]
184
+ tool_node = ToolNode(tools_list)
185
+
186
+ # ---------------------------
187
+ # --- LangGraph LLM ---
188
+ # ---------------------------
189
+ llm = ChatGoogleGenerativeAI(model=model_name)
190
+ llm_with_tools = llm.bind_tools(tools_list, return_direct=True)
191
+
192
+ # --- Graph State ---
193
+ class SportAdvicerState(TypedDict):
194
+ messages: Annotated[List[Union[AIMessage, HumanMessage, ToolMessage]], list.__add__]
195
+
196
+ def detect_intent(query: str) -> str:
197
+ """Classify user query into one of: equipment, skills, document, sports_list, general."""
198
+ classification_prompt = f"""
199
+ You are a classifier.
200
+ Categorize the following user query into exactly ONE of these categories:
201
+ - equipment → if asking about gear, equipment, things needed for a sport
202
+ - skills → if asking about skills, abilities, rankings, requirements for a sport
203
+ - document → if asking about information that may be inside books, PDFs, or retrieved documents
204
+ - sports_list → if asking for a list of sports, categories of sports, or groupings of sports
205
+ - general → if it's a general sports question not fitting the above
206
+
207
+ Query: "{query}"
208
+
209
+ Answer with one word: equipment, skills, document, sports_list, or general.
210
+ """
211
+
212
+ response = client.models.generate_content(
213
+ model=model_name,
214
+ contents=[classification_prompt],
215
+ config=types.GenerateContentConfig(
216
+ temperature=0.0 # deterministic
217
+ )
218
+ )
219
+
220
+ if response.candidates:
221
+ return response.candidates[0].content.parts[0].text.strip().lower()
222
+ else:
223
+ return "general"
224
+
225
+ def extract_sport_name(query: str) -> str:
226
+ """Extract the sport name from a user query."""
227
+ extraction_prompt = f"""
228
+ Extract the single sport name from the following query.
229
+ If multiple sports are mentioned, return the first one.
230
+ If no sport is mentioned, return an empty string.
231
+
232
+ Query: "{query}"
233
+
234
+ Extracted sport name:
235
+ """
236
+ response = client.models.generate_content(
237
+ model=model_name,
238
+ contents=[extraction_prompt],
239
+ config=types.GenerateContentConfig(
240
+ temperature=0.0
241
+ )
242
+ )
243
+ if response.candidates:
244
+ return response.candidates[0].content.parts[0].text.strip()
245
+ return ""
246
+
247
+ def chatbot_node(state: SportAdvicerState) -> SportAdvicerState:
248
+ user_message = state["messages"][-1]
249
+ query = user_message.content
250
+ intent = detect_intent(query)
251
+ sport_name = extract_sport_name(query)
252
+
253
+ if intent == "equipment":
254
+ response_text = get_equipment_by_sport.invoke({"sport": sport_name})
255
+ return {"messages": [AIMessage(content=response_text)]}
256
+
257
+ elif intent == "skills":
258
+ response_text = get_skills_by_sport.invoke({"sport": sport_name})
259
+ return {"messages": [AIMessage(content=response_text)]}
260
+
261
+ elif intent == "document":
262
+ response_text = get_document_answer.invoke({"query": query})
263
+ return {"messages": [AIMessage(content=response_text)]}
264
+
265
+ elif intent == "sports_list":
266
+ response_text = get_sports.invoke({})
267
+ return {"messages": [AIMessage(content=response_text)]}
268
+
269
+ else: # general
270
+ messages_with_instruction = [
271
+ HumanMessage(content="""You are a sports advisor chatbot.
272
+ You can answer general sports questions.
273
+ For equipment, skills, document, or sports list queries, tools are used automatically.""")
274
+ ] + state["messages"]
275
+
276
+ response = llm_with_tools.invoke(messages_with_instruction)
277
+ return {"messages": [response]}
278
+
279
+ # Routing: always go to the single tool node if any tool call exists
280
+ def should_route_to_tools(state: SportAdvicerState):
281
+ last_msg = state["messages"][-1]
282
+ if hasattr(last_msg, "tool_calls") and last_msg.tool_calls:
283
+ return "tools"
284
+ return "__end__"
285
+
286
+ # ---------------------------
287
+ # --- Graph Definition ---
288
+ # ---------------------------
289
+ graph_builder = StateGraph(SportAdvicerState)
290
+ graph_builder.add_node("chatbot", chatbot_node)
291
+ graph_builder.add_node("tools", tool_node)
292
+ graph_builder.add_conditional_edges("chatbot", should_route_to_tools)
293
+ graph_builder.add_edge("tools", "chatbot")
294
+ graph_builder.set_entry_point("chatbot")
295
+ graph_with_rag = graph_builder.compile()
296
+
297
+ # ---------------------------
298
+ # --- Gradio Interface ---
299
+ # ---------------------------
300
+ def chatbot_interface(message, history):
301
+ langchain_messages = []
302
+ for chat_entry in history:
303
+ if isinstance(chat_entry, list) and len(chat_entry) == 2:
304
+ if chat_entry[0]: langchain_messages.append(HumanMessage(content=chat_entry[0]))
305
+ if chat_entry[1]: langchain_messages.append(AIMessage(content=chat_entry[1]))
306
+ elif isinstance(chat_entry, dict):
307
+ if chat_entry["role"] == "user": langchain_messages.append(HumanMessage(content=chat_entry["content"]))
308
+ elif chat_entry["role"] == "assistant": langchain_messages.append(AIMessage(content=chat_entry["content"]))
309
+
310
+ langchain_messages.append(HumanMessage(content=message))
311
+ current_state = {"messages": langchain_messages}
312
+
313
+ try:
314
+ response_state = graph_with_rag.invoke(current_state)
315
+ bot_response = response_state["messages"][-1].content
316
+ return bot_response
317
+ except Exception as e:
318
+ traceback.print_exc()
319
+ return f"Internal error: {e}"
320
+
321
+ iface = gr.ChatInterface(
322
+ fn=chatbot_interface,
323
+ chatbot=gr.Chatbot(height=500, type="messages",
324
+ value=[{"role": "assistant", "content": "Hello! I am your AI Sport Advisor. Ask me anything."}]),
325
+ title="Agentic RAG Sport Advisor Chatbot",
326
+ description="LangGraph chatbot integrated with RAG document retrieval and sports tools.",
327
+ type="messages",
328
+ )
329
+
330
+ if __name__ == "__main__":
331
+ iface.launch(share=True)
src/rag_pipeline.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ from langchain_community.vectorstores import Chroma
3
+ from langchain_google_genai import GoogleGenerativeAIEmbeddings
4
+ from langchain_community.document_loaders import DirectoryLoader, UnstructuredPDFLoader
5
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
6
+
7
+ base_dir = Path(__file__).resolve().parent.parent
8
+ chroma_db_path = "/chroma_db"
9
+ docs_dir = str(base_dir / "rag_docs/")
10
+ chunk_size_ = 1000
11
+ chunk_overlap_ = 200
12
+
13
+ def load_or_create_vector_store():
14
+ vector_store_path = Path(chroma_db_path)
15
+
16
+ embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
17
+
18
+ # Load existing store if present
19
+ if vector_store_path.exists() and any(vector_store_path.iterdir()):
20
+ print("Loading existing vector store...")
21
+ return Chroma(persist_directory=chroma_db_path, embedding_function=embeddings)
22
+
23
+ # Otherwise, create new vector store
24
+ print("Creating new vector store from documents...")
25
+ loader = DirectoryLoader(docs_dir, glob="**/*.pdf", loader_cls=UnstructuredPDFLoader)
26
+ documents = loader.load()
27
+
28
+ text_splitter = RecursiveCharacterTextSplitter(
29
+ chunk_size=chunk_size_,
30
+ chunk_overlap=chunk_overlap_,
31
+ length_function=len,
32
+ is_separator_regex = False
33
+ )
34
+ chunks = text_splitter.split_documents(documents)
35
+
36
+ vector_store = Chroma(persist_directory=chroma_db_path, embedding_function=embeddings)
37
+ vector_store.add_documents(chunks)
38
+ return vector_store