--- license: apache-2.0 datasets: - briefai/LongShort-Dataset language: - en pipeline_tag: text-generation tags: - pytorch - dolly - Gen-AI - Finance - KPI Extraction --- # LongShort-Dolly-2-7B ### Model Description LongShort-Dolly-2-7B is a large language model fine-tuned on earnings call documents to extract financial KPIs from the earnings call documents. It is based on the Dolly-2-7B Architecture. - Model creator: [Brief AI](https://huggingface.co/briefai) - Original model: [Dolly-2-7B](https://huggingface.co/databricks/dolly-v2-7b) ### Dataset Description - Data Source: Factiva - Data Description: 28K+ Earnings Call Documents - Data Scope: 1K+ public companies - Fine Tuning Data: Collection of 60K+ samples. ## Prompt template: LongShort-Dolly-2-7B ``` [INST]Given the context, answer the question. ### Question: Extract all the finance-based performance indicators and evaluation metrics. ### Context: {context} ### Answer: [/INST] ``` ## Basics *This section provides information about the model type, version, license, funders, release date, developers, and contact information.* *It is useful for anyone who wants to reference the model.* **Developed by:** [Brief AI Team](https://huggingface.co/briefai) **Model Type:** Transformer-based Large Language Model **Version:** 1.0.0 **Languages:** English **License:** Apache 2.0 **Release Date Estimate:** Wednesday, 29.November.2023 **Send Questions to:** vishalparameswaran96@gmail.com **Cite as:** Brief AI LongShort Language Model **Funded by:** UChicago Data Science Institute **Mentored by:** Nick Kadochnikov ## Technical Specifications *This section includes details about the model objective and architecture, and the compute infrastructure.* *It is useful for people interested in model development.* Please see [the LongShort training README](https://github.com/brief-ai-uchicago/LongShort-Dataset) for full details on replicating training. ### Model Architecture and Objective * Modified from Dolly-2-7B **Objective:** Financial KPI extraction from earnings call documents. ### Hardware and Software - Compute Infrastructure * 4 NVIDIA L4 GPUs & 48 vCPUs * Environment: PyTorch (pytorch-2.0 w/ CUDA-11.8; see [Github link](https://github.com/pytorch/pytorch)) * CPU: GCP G2 Standard 48 (Platform: Intel Cascade Lake) (Accelerator Optimized) * CPU memory: 192GB RAM * GPU memory: 30GB per GPU ## Training *This section provides information about the training.* *It is useful for people who want to learn more about the model inputs and training footprint.* The following bits and bytes quantization config was used during training: * quant_method: bitsandbytes * load_in_8bit: False * load_in_4bit: True * llm_int8_threshold: 6.0 * llm_int8_skip_modules: None * llm_int8_enable_fp32_cpu_offload: False * llm_int8_has_fp16_weight: False * bnb_4bit_quant_type: nf4 * bnb_4bit_use_double_quant: True * bnb_4bit_compute_dtype: float16 Framework versions * PEFT 0.4.0 ### Training Data *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.* Details for the dataset can be found in [LongShort Dataset](https://github.com/brief-ai-uchicago/LongShort-Dataset) Training data includes: - 5000 Earnings Call Documents ## How to use This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows: [LongShort-Dolly-2-7B](https://huggingface.co/briefai/LongShort-Dolly-2-7B) ## Intended Use This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pre-trained base model that can be further fine-tuned for specific tasks. The use cases below are not exhaustive. ### Direct Use - Text generation - Exploring characteristics of language generated by a language model - Examples: Cloze tests, counterfactuals, generations with reframings ### Downstream Use - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization #### Out-of-scope Uses Using the model in [high-stakes](#high-stakes) settings is out of scope for this model. The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct. Out-of-scope Uses Include: - Usage for evaluating or scoring individuals, such as for employment, education, or credit - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct #### Misuse Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes: - Spam generation - Disinformation and influence operations - Disparagement and defamation - Harassment and abuse - [Deception](#deception) - Unconsented impersonation and imitation - Unconsented surveillance - Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license) ## Intended Users ### Direct Users - General Public - Researchers - Students - Educators - Engineers/developers - Non-commercial entities - Financial Industry # Risks and Limitations *This section identifies foreseeable harms and misunderstandings.* Model may: - Overrepresent some viewpoints and underrepresent others - Contain stereotypes - Contain [personal information](#personal-data-and-information) - Generate: - Hateful, abusive, or violent language - Discriminatory or prejudicial language - Content that may not be appropriate for all settings, including sexual content - Make errors, including producing incorrect information as if it were factual - Generate irrelevant or repetitive outputs - Induce users into attributing human traits to it, such as sentience or consciousness # Evaluation *This section describes the evaluation protocols and provides the results.* Result: LongShort-Falcon-7B gives 45.4% accuracy on a validation set of 10% of the original training dataset. **Train-time Evaluation:** Final checkpoint after 700 epochs: - Training Loss: 1.645 # Recommendations *This section provides information on warnings and potential mitigations.* - Indirect users should be made aware when the content they're working with is created by the LLM. - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary. - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments. # Model Card Authors Vishal Parameshwaran, Garima Sohi, Jose Gerala, Sanchit Narayan Kumar