Spaces:
Sleeping
Sleeping
import streamlit as st | |
st.set_page_config(page_title="LLM Evaluation Lab β by Fikran", layout="wide") | |
st.title("π¬ LLM Evaluation Lab β Real-world Testing by Fikran") | |
st.markdown( | |
""" | |
Welcome to the interactive showcase for evaluating **Large Language Models** (LLMs) using real-world user interactions! | |
π **What this is**: A companion demo to [Fikran](https://fikran.com), a multilingual platform where people interact with AI agents in a natural, dynamic environment. | |
β We offer model testing, benchmarking, and prompt refinement in live contextsβnot just synthetic benchmarks. | |
--- | |
""" | |
) | |
st.header("π― What You Can Do Here") | |
st.markdown( | |
""" | |
- β Understand the **real-world performance** of your LLMs. | |
- π Track how they behave across different user queries and scenarios. | |
- π§ Apply **prompt engineering**, **LoRA-based customization**, and **dialogue tuning**. | |
- π Export insights as PDF reports or structured logs. | |
This system is perfect for: | |
- Researchers evaluating fine-tuned models | |
- Product teams testing chatbot behavior before deployment | |
- Prompt engineers experimenting with multi-agent setups | |
""" | |
) | |
st.header("π οΈ Try It or Order a Full Evaluation") | |
col1, col2 = st.columns(2) | |
with col1: | |
if st.button("π Try Fikran Now"): | |
st.markdown("[Click to explore Fikran](https://www.fikran.com/terms/about-us?lang=english)", unsafe_allow_html=True) | |
with col2: | |
if st.button("π¦ Order a Full Evaluation"): | |
st.markdown("[See the Service on Upwork](https://www.upwork.com/services/product/development-it-a-real-world-evaluation-of-your-llm-in-a-dynamic-interactive-environment-1909379987479305454)", unsafe_allow_html=True) | |
st.markdown("---") | |
st.info("This app is maintained by [Abdennacer Elbasri](https://huggingface.co/elbasri), founder of Fikran.") | |