Spaces:
Sleeping
Sleeping
File size: 1,896 Bytes
d69c5d7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
import streamlit as st
st.set_page_config(page_title="LLM Evaluation Lab β by Fikran", layout="wide")
st.title("π¬ LLM Evaluation Lab β Real-world Testing by Fikran")
st.markdown(
"""
Welcome to the interactive showcase for evaluating **Large Language Models** (LLMs) using real-world user interactions!
π **What this is**: A companion demo to [Fikran](https://fikran.com), a multilingual platform where people interact with AI agents in a natural, dynamic environment.
β
We offer model testing, benchmarking, and prompt refinement in live contextsβnot just synthetic benchmarks.
---
"""
)
st.header("π― What You Can Do Here")
st.markdown(
"""
- β
Understand the **real-world performance** of your LLMs.
- π Track how they behave across different user queries and scenarios.
- π§ Apply **prompt engineering**, **LoRA-based customization**, and **dialogue tuning**.
- π Export insights as PDF reports or structured logs.
This system is perfect for:
- Researchers evaluating fine-tuned models
- Product teams testing chatbot behavior before deployment
- Prompt engineers experimenting with multi-agent setups
"""
)
st.header("π οΈ Try It or Order a Full Evaluation")
col1, col2 = st.columns(2)
with col1:
if st.button("π Try Fikran Now"):
st.markdown("[Click to explore Fikran](https://www.fikran.com/terms/about-us?lang=english)", unsafe_allow_html=True)
with col2:
if st.button("π¦ Order a Full Evaluation"):
st.markdown("[See the Service on Upwork](https://www.upwork.com/services/product/development-it-a-real-world-evaluation-of-your-llm-in-a-dynamic-interactive-environment-1909379987479305454)", unsafe_allow_html=True)
st.markdown("---")
st.info("This app is maintained by [Abdennacer Elbasri](https://huggingface.co/elbasri), founder of Fikran.")
|