Sentence Similarity
sentence-transformers
Safetensors
bert
feature-extraction
Generated from Trainer
dataset_size:46
loss:MultipleNegativesRankingLoss
text-embeddings-inference
Instructions to use ronit01/rag_tuned_minilm_mnr_10epoch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ronit01/rag_tuned_minilm_mnr_10epoch with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_10epoch") sentences = [ "How does RapidFire AI's shard-based adaptive execution engine enable online aggregation of eval metrics with confidence intervals, and what specific mathematical strategies are available for computing those intervals?", "RapidFire AI is a new AI experiment execution framework that transforms your LLM pipeline customization \nfrom slow, sequential processes into rapid, intelligent workflows with hyperparallelized execution, \ndynamic real-time experiment control, and automatic backend optimization.\n\nFor *RAG and context engineering evals*, start here: :doc:`Install and Get Started: RAG and Context Engineering</walkthroughrag>`.\n\nFor *SFT and RFT/post-training workflows*, start here: :doc:`Install and Get Started: SFT/RFT</walkthroughft>`.\n\n\nRapidFire AI is the first system of its kind to establish live three-way communication between the IDE\nwhere the experiment is launched, a metrics display/control dashboard, and a multi-core/multi-GPU execution backend.\n\n.. image:: /images/rf-usage.png\n :width: 800px\n\nJust pip install the :code:`rapidfireai` OSS package. It works on a CPU-only machine, a single-GPU machine, \nor a multi-GPU machine. Note that for RAG/context engineering with only closed model APIs, GPUs are not needed. ", "\nRapidFire AI transforms the status quo by adapting the powerful idea of **online aggregation** \nfrom database systems research to LLM evals. \nOur adaptive execution engine, :doc:`as described on this page</difference>`, automatically \nshards the data and processes multiple configs in parallel, one shard at a time, with \nefficient swapping techniques.\n\nThis means you get **running metric estimates with confidence intervals** in real time. \nSo, you can confidently stop poor configs earlier, clone better configs on the fly, and \nperform more informed exploration to reach much better eval metrics in much less time.\n\n\nExample: Traditional Batch Evals vs. RapidFire AI\n-------\n\nFor instance, suppose you have an evals set with 400 queries. You decide to compare, say, \n4 RAG configs in one go with RapidFire AI with number of shards set to 8. The illustration\nbelow contrasts traditional batch evals vs. RapidFire AI's approach for a simple eval metric.\n\n.. list-table::\n :widths: 50 50\n :class: side-by-side\n\n * - .. figure:: /images/rag-eval-online1.png\n :width: 100%\n :alt: Online aggregation for evals with RapidFire AI and IC Ops.\n\n - .. figure:: /images/rag-eval-online2.png\n :width: 100%\n :alt: Online aggregation for evals with RapidFire AI and IC Ops.\n\n\nAll configs are executed on the first 1/8th of the data (50 examples), with \ntheir **incrementally computed** eval metrics shown in real time with confidence intervals. \nIn the figure, the 3 worst configs are stopped, while the best is cloned to add 2 new variants. \nThe 3 running configs now continue on the second 1/8th of the data (cumulatively, \n100 examples), and so on.\nOne clone is then stopped halfway through the aggregation, while the other two run to completion. \nUltimately, the other clone ends up being the best config overall.\n\nNote that the confidence intervals shown will keep narrowing as configs see more shards, converging \nto zero when 100% of the data is seen, i.e., the metrics become exact point estimates.\nOverall, compared to sequential batch evals in which the original 4 configs all run to completion, \nRapidFire AI enables you to explore more configs in less time, while reaching better eval metrics.\n\n\n\nTypes of Metrics\n-----------------------\n\nWe support 2 types of metrics based on their aggregation semantics: \n\n* **Distributive Metrics:** These are purely additive over a given set of data points. \n \n Examples: *count* of number of correct predictions; *sum* of output token lengths across queries.\n\n* **Algebraic Metrics:** These are averages or proportions over a given set of data points. They can be decomposed into components that are individually distributive. \n \n Examples: *precision*, which counts number of correct predictions and total number of data points separately and then divides them; *mean rouge-1*, which averages per-example rouge-1 values that assesses overlap of tokens between generated text and ground truth text.\n\nWhen you define an eval metric via :func:`evals.compute_metrics_fn()` and :func:`evals.accumulate_metrics_fn()`, \nyou must specify their type (algebraic or distributive) and value range as illustrated below. \nFor metrics without a type defined, they will be displayed *as is*, i.e., without projected \nestimates or confidence intervals.\n\n.. code-block:: python\n\n # Based on GSM8K tutorial use case\n metrics = {\n \"Total\": {\"value\": total},\n \"Correct\": {\n \"value\": correct,\n \"is_distributive\": True,\n \"value_range\": (0, 1),\n },\n \"Accuracy\": {\n \"value\": accuracy,\n \"is_algebraic\": True,\n \"value_range\": (0, 1),\n },\n }\n\nConfidence Intervals\n--------------------\n\nThe data points in the evals dataset are **assigned to shards uniformly randomly**, i.e., \nRapidFire AI performs sampling without replacement. \nBased on that, it supports 3 strategies to calculate confidence intervals for projected estimates of metrics. \nYou can indicate the confidence level (we recommend 95%) and whether to perform \"finite population correction\" (FPC) or not. \nThese values can be specified under the key :code:`\"online_strategy_kwargs\"` in your config dictionary as illustrated below.\n\n.. code-block:: python\n\n # Based on FiQA RAG tutorial use case\n \"online_strategy_kwargs\": {\n \"strategy_name\": \"normal\",\n \"confidence_level\": 0.95,\n \"use_fpc\": True,\n },\n\nNotation \n^^^^^^^\n\n* :math:`N` = Total population size (total number of queries in eval set)\n* :math:`n` = Sample size (number of queries processed so far)\n* :math:`\\hat{p}` = Observed sample proportion or average for an algebraic metric\n* :math:`\\bar{X}` = Sample mean for a distributive metric\n* :math:`\\widehat{T}` = Estimated population total for a distributive metric\n* :math:`\\text{Var}(\\widehat{T})` = Variance of the above estimated population total\n* :math:`\\text{SE}` = Standard error (measure of estimate uncertainty)\n* :math:`\\text{CI}` = Confidence interval\n* :math:`z` = Z-score for confidence level (1.96 for 95% confidence; used in Normal and Wilson)\n* :math:`\\alpha` = Significance level (0.05 for 95% confidence)\n* :math:`n_{\\text{eff}}` = Effective sample size (adjusted for FPC in Wilson)\n* :math:`a, b` = Lower and upper bounds of metric value range\n* :math:`R` = Range width, :math:`R = b - a`\n* :math:`\\varepsilon` = Margin of error (half-width of confidence interval for Hoeffding)\n* :math:`\\varepsilon_{\\bar{X}}` = Margin of error for sample mean (Hoeffding distributive)\n* :math:`\\text{FPC}` = Finite population correction factor\n\n\nFinite Population Correction (FPC)\n^^^^^^^^^^^^^^^^^^^^^^\n\nWhen sampling without replacement from finite populations, enabling FPC \nmultiplies the standard error (SE) by :math:`\\text{FPC} = \\sqrt{(N-n)/(N-1)}` \nwhere :math:`N` is population size and :math:`n` is sample size.\n\n\nNormal Approximation\n^^^^^^^^^^^^^^^^^^^\n\nThis is the default strategy, and it uses the Central Limit Theorem. \nIt is suitable for most cases with non-trivial sample sizes (n > 30). \nIt provides tight intervals when the statistical assumptions hold.\n\n* For algebraic metrics:\n\n.. math::\n\n \\text{SE}_{\\hat{p}} = \\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{n}} \\times \\text{FPC}\n\n \\text{CI} = \\hat{p} \\pm 1.96 \\cdot \\text{SE}_{\\hat{p}}\n\n\n* For distributive metrics: \n\nEstimate population total :math:`\\widehat{T} = N\\bar{X}` with \nvariance :math:`\\text{Var}(\\widehat{T}) = N^2 \\cdot \\bar{X}(1-\\bar{X})/n` (FPC-adjusted).\n\n\nWilson Score\n^^^^^^^^^^^\n\nThis strategy is better for small sample sizes or metrics near 0/1 boundaries. \nIt is more robust than Normal Approximation for extreme proportions. \n\n* For algebraic metrics:\n\n.. math::\n\n \\text{center} = \\frac{\\hat{p} + z^2/(2n_{\\text{eff}})}{1 + z^2/n_{\\text{eff}}}\n\n \\text{margin} = \\frac{z\\sqrt{\\hat{p}(1-\\hat{p})/n_{\\text{eff}} + z^2/(4n_{\\text{eff}}^2)}}{1 + z^2/n_{\\text{eff}}}\n\nwhere :math:`n_{\\text{eff}} = n/\\text{FPC}^2` when using FPC. \nThe Wilson confidence interval is then :math:`[\\text{center} - \\text{margin}, \\text{center} + \\text{margin}]`,\nclamped to [0, 1].\n\n* For distributive metrics, this falls back to Normal Approximation. \n\n\n\nHoeffding Bounds\n^^^^^^^^^^^\n\nThis strategy is best for maximum safety (guaranteed coverage). It makes no distributional assumptions, \nbut that also means its intervals are typically quite loose.\n\n.. math::\n\n \\varepsilon = (b-a)\\sqrt{\\frac{\\ln(2/\\alpha)}{2n}} \\times \\text{FPC}\n\n \\text{CI} = [\\hat{p} - \\varepsilon, \\hat{p} + \\varepsilon]\n\nFor distributive metrics with range :math:`R=b-a`, it computes :math:`\\varepsilon_{\\bar{X}} = R\\sqrt{\\ln(2/\\alpha)/(2n)}` \nand then scales to population total.", "This class wraps around some LangChain APIs to manage dynamic few-shot example selection. It provides semantic \nsimilarity-based example selection to construct prompts with the most relevant examples for each input query.\n\nThe individual arguments (knobs) can be :class:`List` valued or :class:`Range` valued in an :class:`RFPromptManager`. \nThat is how you can specify a base set of knob combinations from which a config group can be produced. \nAlso read :doc:`the Multi-Config Specification page</configs>`.\n\n.. py:class:: RFPromptManager\n\n :param instructions: The main instructions for the prompt that guide the generator's behavior. This sets the overall task description and role for the assistant. Either this or :code:`instructions_file_path` must be provided.\n :type instructions: str, optional\n\n :param instructions_file_path: Path to a file containing the instructions. Use this as an alternative to the :code:`instructions` parameter for loading instructions from a file, say, if they are very long.\n :type instructions_file_path: str, optional\n\n :param examples: A list of example dictionaries for few-shot learning. Each example should be a dictionary with keys matching the expected input-output format (e.g., \"question\" and \"answer\").\n :type examples: list[dict[str, str]], optional\n\n\n :param embedding_cfg: The embedding class and its kwargs to use for computing semantic similarity between examples and queries, provided as a single dictionary. Must include a key :code:`\"class\"` with the class itself as value, not an instance. Options for the class include :class:`HuggingFaceEmbeddings` and :class:`OpenAIEmbeddings`. The kwargs that follow must contain all parameters needed to initialize the embedding class; required parameters vary by embedding class. For example, :class:`HuggingFaceEmbeddings` needs :code:`model_name`, :code:`model_kwargs` and :code:`device`, while :class:`OpenAIEmbeddings` needs :code:`\"model\"` and :code:`\"api_key\"`.\n :type embedding_cfg: dict[str, Any], optional\n\n\n :param example_selector_cls: The example selector class that determines how to choose relevant examples based on the input query. Must be either :code:`SemanticSimilarityExampleSelector` or :code:`MaxMarginalRelevanceExampleSelector` (for diversity) from LangChain.\n :type example_selector_cls: type[MaxMarginalRelevanceExampleSelector | SemanticSimilarityExampleSelector], optional\n\n :param example_prompt_template: A LangChain :code:`PromptTemplate` that defines how to format each example. Should specify :code:`input_variables` and a :code:`template` string with placeholders matching the keys in the examples dictionaries.\n :type example_prompt_template: PromptTemplate, optional\n\n :param k: Number of most similar or diverse examples to retrieve and include in the prompt for each query. Default is 3.\n :type k: int, optional" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Welcome to the community
The community tab is the place to discuss and collaborate with the HF community!