---
license: apache-2.0
datasets:
- briefai/LongShort-Dataset
language:
- en
pipeline_tag: text-generation
tags:
- pytorch
- dolly
- Gen-AI
- Finance
- KPI Extraction
---
# LongShort-Dolly-2-7B

### Model Description

LongShort-Dolly-2-7B is a large language model fine-tuned on earnings call documents to extract financial KPIs from the earnings call documents. It is based on the Dolly-2-7B Architecture.
- Model creator: [Brief AI](https://huggingface.co/briefai)
- Original model: [Dolly-2-7B](https://huggingface.co/databricks/dolly-v2-7b)
  
### Dataset Description
- Data Source: Factiva
- Data Description: 28K+ Earnings Call Documents
- Data Scope: 1K+ public companies
- Fine Tuning Data: Collection of 60K+ samples.

## Prompt template: LongShort-Dolly-2-7B

```
[INST]Given the context, answer the question.

### Question:
Extract all the finance-based performance indicators and evaluation metrics.

### Context:
{context}

### Answer:
[/INST]

```
  
## Basics
*This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
*It is useful for anyone who wants to reference the model.*

  
**Developed by:**  [Brief AI Team](https://huggingface.co/briefai)
    
**Model Type:** Transformer-based Large Language Model

**Version:** 1.0.0

**Languages:** English

**License:** Apache 2.0

**Release Date Estimate:** Wednesday, 29.November.2023

**Send Questions to:** vishalparameswaran96@gmail.com

**Cite as:** Brief AI LongShort Language Model

**Funded by:**  UChicago Data Science Institute

**Mentored by:**  Nick Kadochnikov

## Technical Specifications
*This section includes details about the model objective and architecture, and the compute infrastructure.*
*It is useful for people interested in model development.*

Please see [the LongShort training README](https://github.com/brief-ai-uchicago/LongShort-Dataset) for full details on replicating training.

### Model Architecture and Objective

* Modified from Dolly-2-7B

**Objective:** Financial KPI extraction from earnings call documents.
    
### Hardware and Software - Compute Infrastructure

* 4 NVIDIA L4 GPUs & 48 vCPUs

* Environment: PyTorch (pytorch-2.0 w/ CUDA-11.8; see [Github link](https://github.com/pytorch/pytorch))

* CPU: GCP G2 Standard 48 (Platform: Intel Cascade Lake) (Accelerator Optimized)

* CPU memory: 192GB RAM

* GPU memory: 30GB per GPU

## Training
*This section provides information about the training.*
*It is useful for people who want to learn more about the model inputs and training footprint.*

The following bits and bytes quantization config was used during training:

* quant_method: bitsandbytes
* load_in_8bit: False
* load_in_4bit: True
* llm_int8_threshold: 6.0
* llm_int8_skip_modules: None
* llm_int8_enable_fp32_cpu_offload: False
* llm_int8_has_fp16_weight: False
* bnb_4bit_quant_type: nf4
* bnb_4bit_use_double_quant: True
* bnb_4bit_compute_dtype: float16
  
Framework versions
* PEFT 0.4.0
    

### Training Data
*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*

Details for the dataset can be found in [LongShort Dataset](https://github.com/brief-ai-uchicago/LongShort-Dataset)

Training data includes:

-   5000 Earnings Call Documents
    
## How to use

This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:

[LongShort-Dolly-2-7B](https://huggingface.co/briefai/LongShort-Dolly-2-7B)

## Intended Use

This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pre-trained base model that can be further fine-tuned for specific tasks. The use cases below are not exhaustive.

### Direct Use

-   Text generation

-   Exploring characteristics of language generated by a language model

    -   Examples: Cloze tests, counterfactuals, generations with reframings

### Downstream Use

-   Tasks that leverage language models include: Information Extraction, Question Answering, Summarization


#### Out-of-scope Uses

Using the model in [high-stakes](#high-stakes) settings is out of scope for this model.  The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct.  

Out-of-scope Uses Include:

-   Usage for evaluating or scoring individuals, such as for employment, education, or credit

-   Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct

#### Misuse

Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:

-   Spam generation

-   Disinformation and influence operations

-   Disparagement and defamation

-   Harassment and abuse
  
-   [Deception](#deception)

-   Unconsented impersonation and imitation

-   Unconsented surveillance 

-   Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)

## Intended Users

### Direct Users

-   General Public

-   Researchers

-   Students

-   Educators

-   Engineers/developers

-   Non-commercial entities

-   Financial Industry

# Risks and Limitations
*This section identifies foreseeable harms and misunderstandings.*

Model may:

-   Overrepresent some viewpoints and underrepresent others

-   Contain stereotypes
  
-   Contain [personal information](#personal-data-and-information)

-   Generate:

    -   Hateful, abusive, or violent language

    -   Discriminatory or prejudicial language

    -   Content that may not be appropriate for all settings, including sexual content

-   Make errors, including producing incorrect information as if it were factual

-   Generate irrelevant or repetitive outputs

-   Induce users into attributing human traits to it, such as sentience or consciousness


# Evaluation
*This section describes the evaluation protocols and provides the results.*

Result: LongShort-Falcon-7B gives 45.4% accuracy on a validation set of 10% of the original training dataset.


**Train-time Evaluation:**

Final checkpoint after 700 epochs:

- Training Loss: 1.645


# Recommendations
*This section provides information on warnings and potential mitigations.*

-   Indirect users should be made aware when the content they're working with is created by the LLM.

-   Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.

-   Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.

# Model Card Authors
Vishal Parameshwaran, Garima Sohi, Jose Gerala, Sanchit Narayan Kumar