LongFinBERT-base / README.md
minhtriphan's picture
Update README.md
c4b66fa
---
language:
- en
tags:
- finance
---
# Introduction
This is the implementation of the BERT model using the LongNet structure (paper: https://arxiv.org/pdf/2307.02486.pdf).
The model is pre-trained with 10-K/Q filings of US firms from 1994 to 2008. Filings from 2009 to 2013 are used for model validation, and filings from 2013 to 2018 are used for model testing.
# Disclaimer
~The current model is trained from randomly initialized weights due to some computational and data obstacles. Therefore, the context captured by the models as well as the word semantics are not really good. The tokenizer in this version is also trained from scratch.~
The new model weights are updated. The details of the training is described below:
We're training the model again with more care and some tricks to enhance the semantics of words. To this end, we initialize the embedding layers (i.e., `word_embeddings`, `position_embeddings`, `token_type_embeddings`, and `LayerNorm`) with the pre-trained embeddings from FinBERT (https://huggingface.co/yiyanghkust/finbert-tone). Accordingly, we use the same tokenizer as that of this model.
Furthermore, the model is trained longer (~10 epochs~ 8 epochs). ~The new pre-trained model weights will be updated as soon as the training and validation are completed.~
# Time and space efficiency
We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with the maximum length of 65536 tokens.
The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation are included. In the inferring model, only forward pass is included.
## Training mode
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/LrvSu7SuZy_KobgBSvn82.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/S9OavetyQNiBJm2nsQJI-.png)
# Inferring mode
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/A7blekLx36Fg_kogPUOQi.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/lfd1X1piq8a78xYNYmv3m.png)
# Training code
https://github.com/minhtriphan/LongFinBERT-base/tree/main
# Training configuration
* The models are trained with 8 epochs using the Masked Language Modeling (MLM) task;
* The masking probability is 15%;
* Details about the training configuration are given in the log files named `train_v1_0803_1144_seed_1.log` and `train_v2_0831_1829_seed_1.log`;
# Versions
There are 2 versions of the pre-trained model,
* v1 - Random Masking: We randomly choose tokens to mask in the MLM task;
* v2 - Selective Masking: As we want the model to learn more about the financial context, we selectively choose tokens to mask in the MLM task. We rely on the Loughran-McDonald dictionary to choose the important tokens to masked.
The argument `version` in the method `load_pretrained` of the `LongBERTModel` allows one to choose which version is loaded.
# Instruction to load the pre-trained model
* Clone the git repo
```
git clone https://github.com/minhtriphan/LongFinBERT-base.git
cd LongFinBERT-base
```
or
```
!git clone https://github.com/minhtriphan/LongFinBERT-base.git
import sys
sys.path.append('/LongFinBERT-base')
```
* Load the pre-trained tokenizer, model configuration, and model weights
```
from model import LongBERTModel
from custom_config import LongBERTConfig
from tokenizer import LongBERTTokenizer
backbone = 'minhtriphan/LongFinBERT-base'
tokenizer = LongBERTTokenizer.from_pretrained(backbone)
config = LongBERTConfig.from_pretrained(backbone)
model = LongBERTModel.from_pretrained(backbone, version = 'v1')
```
# Model usage
```
txt = '\n0000912057-94-000263.hdr.sgml : 19950608\nACCESSION NUMBER:\t\t0000912057-94-000263\nCONFORMED SUBMISSION TYPE:\t10-K\nPUBLIC DOCUMENT COUNT:\t\t3\nCONFORMED PERIOD OF REPORT:\t19930831\nFILED AS OF DATE:\t\t19931129\nDATE AS OF CHANGE:\t\t19931129\nSROS:\t\t\tNONE\n\nFILER:\n\n\tCOMPANY DATA:\t\n\t\tCOMPANY CONFORMED NAME:\t\t\tAMERICAN MEDICAL HOLDINGS INC\n\t\tCENTRAL INDEX KEY:\t\t\t0000861439\n\t\tSTANDARD INDUSTRIAL CLASSIFICATION:\t8060\n\t\tIRS NUMBER:\t\t\t\t133527632\n\t\tSTATE OF INCORPORATION:\t\t\tDE\n\t\tFISCAL YEAR END:\t\t\t0831\n\n\tFILING VALUES:\n\t\tFORM TYPE:\t\t10-K\n\t\tSEC ACT:\t\t1934 Act\n\t\tSEC FILE NUMBER:\t001-10511\n\t\tFILM NUMBER:\t\t94505453\n\n\tBUSINESS ADDRESS:\t\n\t\tSTREET 1:\t\t8201 PRESTON RD, SUITE 300\n\t\tCITY:\t\t\tDALLAS\n\t\tSTATE:\t\t\tTX\n\t\tZIP:\t\t\t75255\n\t\tBUSINESS PHONE:\t\t2143606300\n\n</SEC-Header>\n</Header>\n\n \nProc-Type: 2001,MIC-CLEAR\nOriginator-Name: keymaster@town.hall.org\nOriginator-Key-Asymmetric:\n MFkwCgYEVQgBAQICAgADSwAwSAJBALeWW4xDV4i7+b6+UyPn5RtObb1cJ7VkACDq\n pKb9/DClgTKIm08lCfoilvi9Wl4SODbR1+1waHhiGmeZO8OdgLUCAwEAAQ==\nMIC-Info: RSA-MD5,RSA,\n jSme4OE5puXgBpdHHyga1WdDJ0E3trqOOdfp13QPWNizEt4YLMTbUPjitjQi47a9\n tBwulFatOU1F7uc/UNiQZQ==\n\n 0000912057-94-000263.txt : 19950608\n\n10-K\n 1\n 10-K\n\n- - - - --------------------------------------------------------------------------------\n- - - - --------------------------------------------------------------------------------\n\n SECURITIES AND EXCHANGE COMMISSION\n WASHINGTON, D.C. 20549\n\n ------------------------\n\n Form 10-K\n(Mark One)\n /X/ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15 (D)\n OF THE SECURITIES EXCHANGE ACT OF 1934 (FEE REQUIRED)\n FOR THE FISCAL YEAR ENDED AUGUST 31, 1993\n OR\n\n / / TRANSITION REPORT PURSUANT TO SECTION 13 OR 15 (D)\n OF THE SECURITIES EXCHANGE ACT OF 1934 (NO FEE REQUIRED)\n FOR THE TRANSITION PERIOD FROM TO\n\n COMMISSION FILE NUMBER)\n 1-10511\n ---------------------\n\n AMERICAN MEDICAL HOLDINGS, INC.\n (Exact name of registrant as specified in its charter)\n\n DELAWARE 13-3527632\n (State or other jurisdiction of (I.R.S. Employer\n incorporation or organization) Identification No.)\n\n Commission file number\n 1-7612\n ------------------------\n\n AMERICAN MEDICAL INTERNATIONAL, INC.\n (Exact name of registrant as specified in its charter)\n\n DELAWARE 95-2111054\n(State or other jurisdiction of (I.R.S. Employer Identification No.)\n incorporation or organization)\n8201 Preston Road, Dallas, Texas 75225\n(Address of principal executive (Zip Code)\n offices)\n\n (Registrants\' telephone number, including area code) (214) 360-6300\n ------------------------\n\n Securities registered pursuant to Section 12(b) of the Act:\n American Medical Holdings, Inc.:\n\n(TITLE OF EACH CLASS) (NAME OF EACH EXCHANGE ON WHICH REGISTERED)\n- - - - ---------------------- -------------------------------------------\n COMMON STOCK NEW YORK STOCK EXCHANGE\n\n Securities registered pursuant to Section 12(g) of the Act:\n American Medical International, Inc.:\n 8 1/4% Convertible Subordinated Debentures due 2008\n 9 1/2% Convertible Subordinated Debentures due 2001\n (Title of class)\n\n Indicate by check mark whether the Registrants (1) have filed all reports\nrequired to be filed by Section 13 or 15(d) of the Securities Exchange Act of\n1934 during the preceding 12 months (or for such shorter period that the\nRegistrants were required to file such reports), and (2) have been subject to\nsuch filing requirements for the past 90 days. American Medical Holdings, Inc.\n Yes _X_ No ____ . American Medical International, Inc.\nYes _X_ No ____ .\n\n As of November 18, 1993 there were 76,987,204 shares of American Medical\nHoldings, Inc. Common Stock, $.01 par value, outstanding. The aggregate market\nvalue of Common Stock held by non-affiliates of the registrant, based on the\nclosing price of these shares at November 18, 1993, was approximately\n$479,199,166. For the purposes of the foregoing calculation only, all directors\nand executive officers and principal stockholders of the registrant have been\ndeemed affiliates.\n\n All shares of Common Stock, $.01 par value, of American Medical\nInternational, Inc. are held by American Medical Holdings, Inc.\n\n DOCUMENTS INCORPORATED BY REFERENCE\nAmerican Medical Holdings, Inc.\'s definitive proxy statement for its 1994 Annual\n Meeting of Stockholders....Part III\n\n- - - - --------------------------------------------------------------------------------\n- - - - --------------------------------------------------------------------------------\n \n INDEX\n\nPAGE\n REFERENCE\n ---------\n \n PART I\nItem 1. Business.............................................................. 1\nItem 2. Properties............................................................ 12\nItem 3. Legal Proceedings..................................................... 12\nItem 4. Submission of Matters to a Vote of Security Holders................... 12\n PART II\nItem 5. Market for the Registrant\'s Common Stock and Related Stockholder\n Matters............................................................... 14\nItem 6. Selected Financial Data............................................... 15\nItem 7. Management\'s Discussion and Analysis of Financial Condition and\n Results of Operations................................................. 16\nItem 8. Financial Statements and Supplementary Data........................... 23\nItem 9. Changes in and Disagreements with Accountants on Accounting and\n Financial Disclosure.................................................. 23\n PART III\nItem 10. Directors and Executive Officers of the Registrants................... 23\nItem 11. Executive Compensation................................................ 23\nItem 12. Security Ownership of Certain Beneficial Owners and Management........ 23\nItem 13. Certain Relationships and Related Transactions........................ 23\n PART IV\nItem 14. Exhibits, Financial Statement Schedules and Reports on Form 8-K....... 23\n\nPART I\n\nITEM 1. BUSINESS\n\n GENERAL\n\n American Medical Holdings, Inc. ("Holdings") was organized in July, 1989 to\nacquire American Medical International, Inc. ("AMI" and, together with Holdings,\nthe "Company"). As a result of this acquisition, Holdings is the owner of all of\nthe outstanding shares of common stock of AMI.\n\n The Company is one of the leading hospital management companies in the\nUnited States. Generally, the Company\'s hospitals provide a full range of\ninpatient and outpatient services including medical/surgical, obstetric and\ndiagnostic services and services provided by intensive care units, emergency\nrooms, laboratories and pharmacies. The Company also operates ancillary\nfacilities at each of its hospitals, such as ambulatory, occupational and rural\nhealthcare clinics. At August 31, 1993, the Company operated 35 domestic acute\ncare hospitals and one psychiatric hospital containing a total of 8,003 licensed\nbeds. The Company\'s hospitals are principally located in the suburbs of major\nmetropolitan areas in 12 states including Texas, Florida and California. Through\nbroad networks including health maintenance organizations, preferred provider\norganizations, insurers and employers, the Company provides high quality,\naffordable health services while facing the challenge of containing the\ncontinually rising healthcare costs.\n\n Management expects that the Company\'s ongoing control of costs emphasized\nduring fiscal 1993 will provide the Company a competitive edge to increase\nmarket share notwithstanding the presence of a managed care environment. In\nresponse to the ever-changing healthcare system, the shift toward outpatient\nservices, the need to reduce provider costs for acute-care services and the\nClinton Administration\'s desire to provide universal access to healthcare, the\nCompany is developing physician networks and alliances with other healthcare\nproviders to create fully integrated healthcare delivery systems.\n\n Holdings and AMI are Delaware corporations with principal executive offices\nlocated at 8201 Preston Road, Suite 300, P.O. Box 25651, Dallas, Texas\n75225-5651. The telephone number for Holdings and AMI at such address is (214)\n360-6300. AMI was incorporated in 1957.\n\n PROPERTIES\n\n The Company owns or leases and operates the following 35 acute care\nhospitals and one psychiatric hospital.\n\n1\n\nThe Company also owns or manages medical office buildings and related\nhealthcare facilities associated with 31 of its hospitals as well as certain\nundeveloped properties.\n\n 2\n \n EMPLOYEES\n\n As of August 31, 1993, the Company had approximately 28,200 employees, of\nwhich approximately 66% were full time employees. Two of the Company\'s hospitals\nhad labor contracts covering approximately 5% of the Company\'s employees.\nManagement believes that its relations with its employees generally are\nsatisfactory.\n\n MEDICAL STAFFS\n\n The medical staff at each hospital generally consists of non-employee\nphysicians. There is a trend in the healthcare industry in some regions to\nemploy physicians and where appropriate, the Company\'s hospitals have pursued\nthis option. Medical staff members of the Company-owned hospitals that are not\nemployees usually also serve on the medical staffs of hospitals not owned by the\nCompany and may terminate their relationships with the Company-owned hospitals\nat any time.\n\n Rules and regulations concerning the medical aspects of each hospital\'s\noperations are adopted and enforced by its medical staff. Such rules and\nregulations provide that the members of the staff elect officers who, together\nwith additional physicians selected by them, supervise all medical and surgical\nprocedures and services. Their supervision is subject to the general oversight\nof the hospital\'s Governing Board.\n\n QUALITY OF SERVICES\n\n Management believes the quality of healthcare services is critical in order\nto attract and retain top physicians and increase the market share of the\nCompany\'s hospitals. One of the key mechanisms used to monitor the quality of\ncare at the Company\'s hospitals is a quality assurance program designed to\nmeasure patient satisfaction, the Patient Satisfaction Monitoring System\n("PSMS"). PSMS utilizes the results of interviews performed by an independent\nresearch company of a statistically determined sample group of discharged\npatients at each hospital to gather patient responses regarding the hospital\nservices provided. Management uses the results as a tool to improve the quality\nof patient services and satisfaction and believes PSMS has assisted the Company\nin successfully maintaining and improving the quality of healthcare as perceived\nby patients and their physicians and thereby contributing to improved net\nrevenues. PSMS is also used by the Company as one of the bases upon which\nhospital executive directors and other employees are compensated under the\nCompany\'s incentive compensation program. Management believes that the Company\nwas the first in the industry to directly tie compensation to the attainment of\nqualitative performance targets.\n\n The Company has recently developed a system similar to PSMS which is\ndesigned to measure physician satisfaction, the MD Satisfaction Survey. A pilot\nprogram for this survey has been implemented at one hospital and the Company\nplans to make it available for use at each of the Company\'s hospitals in fiscal\n1994.\n\n COMPETITION\n\n Generally, other investor-owned and non-profit hospitals operate in the\nlocal markets in which the Company participates and provide services that are\nsimilar to those offered by the Company\'s hospitals. Competition among hospitals\nand other healthcare providers in the United States has increased in recent\nyears due to a decline in occupancy rates resulting from, among other things,\nchanges in government regulation and reimbursement, other cost containment\npressures, technology, and most recently, the healthcare reform plan proposed by\nthe Clinton Administration. Additionally, hospitals owned by government agencies\nor other tax-exempt entities benefit from advantages such as endowments,\ncharitable contributions and tax-exempt financing, which advantages are not\navailable to the Company\'s hospitals.\n\n Management believes that a hospital\'s competitive position within local\nmarkets is affected by various factors including the quality of healthcare\nservices provided, pricing of healthcare services, the hospital\'s location and\nthe types of services offered. The Company expects to improve the performance of\nits hospitals by (i) expanding physician network relationships thereby\nattracting and retaining\n\n 3\n \nquality physician and medical personnel, (ii) increasing its emphasis on managed\ncare contracting, (iii) developing and marketing new healthcare services\ntargeted to the particular needs of the communities served by its hospitals, and\n(iv) expanding profitable outpatient services.\n\n The competitive position of a hospital is increasingly affected by its\nability to negotiate contracts for healthcare services with managed care\norganizations, including health maintenance organizations ("HMOs"), preferred\nprovider organizations ("PPOs") and other purchasers of group healthcare\nservices. HMOs and PPOs attempt to direct and control use of hospital services\nthrough strict utilization management programs and by negotiating provider\ncontracts with only one or a limited number of hospitals in each market area.\nThe importance of negotiating with managed care organizations varies from market\nto market depending on the market strength of such organizations. In some\nsituations, hospitals have agreed to fixed payments based on the number of\nmanaged care enrollees, thereby assuming hospital utilization risk (such\ncontracts are referred to as capitated contracts). Managed care organizations\nare generally able to obtain discounts from hospital established charges.\nManagement believes that the Company is able to compete effectively for managed\ncare business in part because of its relationships with local physicians, its\nhospital management teams, its attention to cost controls and quality of service\nand its strategies to establish service niches in markets served by other\nhospitals.\n\n Merger and acquisition activity has significantly increased in the\nhealthcare industry involving both investor-owned and non-profit entities. As\nhealthcare reforms announced by the Clinton Administration take effect,\nmanagement believes that it will become more important for hospitals and other\nhealthcare providers to work together to form fully integrated healthcare\ndelivery systems and thereby provide the community and marketplace with high\nquality, cost effective healthcare products and services. During fiscal 1993 the\nCompany entered into an agreement with HealthTrust, Inc.-The Hospital Company\n("HealthTrust") to jointly operate AMI\'s Tarzana Regional Medical Center and\nHealthTrust\'s Encino Hospital. Management is continually evaluating other\nsimilar opportunities and acquisitions to expand the networks in which the\nCompany currently participates.\n\n SOURCES OF REVENUE\n\n The sources of the Company\'s hospital revenues are room and board and the\nprovision of ancillary medical services. Room and board represents the basic\ncharges for the hospital room and related services, such as general nursing care\nand meals. Ancillary medical services represent the charges related to the\nmedical support activities performed by the hospital, such as X-rays, physical\ntherapy and laboratory procedures. The Company receives payments for services\nrendered to patients from the federal government under Medicare and the Civilian\nHealth and Medical Program of Uniformed Services ("CHAMPUS") programs, state\ngovernments under their respective Medicaid programs, managed care organizations\n("contracted services"), private insurers, self-insured employers and directly\nfrom patients. In addition to revenues received from such programs and patients,\nthe Company receives other non-patient revenues (e.g. cafeteria and gift shop\nrevenues). During fiscal 1991, the Company also recognized revenues associated\nwith an HMO owned by the Company and divested in fiscal 1991.\n\n The following table presents the percentage of net revenues for the three\nyears ended August 31 under each of the following programs:\n\nThe Company\'s hospital revenues received under Medicare, Medicaid, CHAMPUS,\nBlue Cross and from payors of contracted services are generally less than\ncustomary charges for the services covered. Following the initiative taken by\nthe federal government to control healthcare costs, other\n\n 4\n \nmajor purchasers of healthcare, including states, insurance companies and\nemployers, are increasingly negotiating the amounts they will pay for services\nperformed rather than simply paying healthcare providers their customary\ncharges. Managed care programs which offer prepaid and discounted medical\nservice packages are capturing an increasing share of the market, tending to\nreduce the historical rate of growth of hospital revenues. As a result, new\nkinds of healthcare strategies and provider networks (e.g. physician networks)\nare continuing to emerge.\n\n Patients are generally not responsible for any difference between customary\nhospital charges and amounts reimbursed under Medicare, Medicaid, CHAMPUS and\nsome Blue Cross plans or by payors of contracted services for such services,\nexcept to the extent of any exclusions, deductibles or co-insurance features of\ntheir coverage. In recent years insurers and other payors have increased the\namount of such exclusions, deductibles and co-insurance generally increasing the\npatient\'s financial responsibility to directly pay for some services. The\nincrease in the self-pay portion of a patient\'s financial responsibility may\nalso increase the Company\'s uncollectible accounts.\n'
import torch
tokenized_txt = tokenizer(txt, return_tensors = 'pt')
with torch.no_grad():
output = model(tokenized_txt['input_ids'], tokenized_txt['attention_mask'])
```
# Contact
For any comments, questions, or feedback, please get in touch with us via phanminhtri2611@gmail.com or triminh.phan@unisg.ch.
# Paper
(updating)