LongFinBERT-base / README.md
minhtriphan's picture
Update README.md
eda0730
|
raw
history blame
24.1 kB
metadata
language:
  - en
tags:
  - finance

Introduction

This is the implementation of the BERT model using the LongNet structure (paper: https://arxiv.org/pdf/2307.02486.pdf).

The model is pre-trained with 10-K/Q filings of US firms from 1994 to 2008. Filings from 2009 to 2013 are used for model validation, and filings from 2013 to 2018 are used for model testing.

Disclaimer

The current model is trained from randomly initialized weights due to some computational and data obstacles. Therefore, the context captured by the models as well as the word semantics are not really good. The tokenizer in this version is also trained from scratch.

The new model weights are updated. The details of the training is described below:

We're training the model again with more care and some tricks to enhance the semantics of words. To this end, we initialize the embedding layers (i.e., word_embeddings, position_embeddings, token_type_embeddings, and LayerNorm) with the pre-trained embeddings from FinBERT (https://huggingface.co/yiyanghkust/finbert-tone). Accordingly, we use the same tokenizer as that of this model.

Furthermore, the model is trained longer (10 epochs 8 epochs). The new pre-trained model weights will be updated as soon as the training and validation are completed.

Time and space efficiency

We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with the maximum length of 65536 tokens.

The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation are included. In the inferring model, only forward pass is included.

Training mode

image/png image/png

Inferring mode

image/png image/png

Training code

https://github.com/minhtriphan/LongFinBERT-base/tree/main

Training configuration

  • The model is trained with 4 epochs using the Masked Language Modeling (MLM) task;
  • The masking probability is 15%;
  • Details about the training configuration are given in the log file named train_v1a_0803_1144_seed_1.log;

Instruction to load the pre-trained model

  • Clone the git repo
git clone https://github.com/minhtriphan/LongFinBERT-base.git
cd LongFinBERT-base

or

!git clone https://github.com/minhtriphan/LongFinBERT-base.git
import sys
sys.path.append('/LongFinBERT-base')
  • Load the pre-trained tokenizer, model configuration, and model weights
from model import LongBERTModel
from custom_config import LongBERTConfig
from tokenizer import LongBERTTokenizer

backbone = 'minhtriphan/LongFinBERT-base'

tokenizer = LongBERTTokenizer.from_pretrained(backbone)
config = LongBERTConfig.from_pretrained(backbone)
model = LongBERTModel.from_pretrained(backbone)

Model usage

txt = '\n0000912057-94-000263.hdr.sgml : 19950608\nACCESSION NUMBER:\t\t0000912057-94-000263\nCONFORMED SUBMISSION TYPE:\t10-K\nPUBLIC DOCUMENT COUNT:\t\t3\nCONFORMED PERIOD OF REPORT:\t19930831\nFILED AS OF DATE:\t\t19931129\nDATE AS OF CHANGE:\t\t19931129\nSROS:\t\t\tNONE\n\nFILER:\n\n\tCOMPANY DATA:\t\n\t\tCOMPANY CONFORMED NAME:\t\t\tAMERICAN MEDICAL HOLDINGS INC\n\t\tCENTRAL INDEX KEY:\t\t\t0000861439\n\t\tSTANDARD INDUSTRIAL CLASSIFICATION:\t8060\n\t\tIRS NUMBER:\t\t\t\t133527632\n\t\tSTATE OF INCORPORATION:\t\t\tDE\n\t\tFISCAL YEAR END:\t\t\t0831\n\n\tFILING VALUES:\n\t\tFORM TYPE:\t\t10-K\n\t\tSEC ACT:\t\t1934 Act\n\t\tSEC FILE NUMBER:\t001-10511\n\t\tFILM NUMBER:\t\t94505453\n\n\tBUSINESS ADDRESS:\t\n\t\tSTREET 1:\t\t8201 PRESTON RD, SUITE 300\n\t\tCITY:\t\t\tDALLAS\n\t\tSTATE:\t\t\tTX\n\t\tZIP:\t\t\t75255\n\t\tBUSINESS PHONE:\t\t2143606300\n\n</SEC-Header>\n</Header>\n\n \nProc-Type: 2001,MIC-CLEAR\nOriginator-Name: keymaster@town.hall.org\nOriginator-Key-Asymmetric:\n MFkwCgYEVQgBAQICAgADSwAwSAJBALeWW4xDV4i7+b6+UyPn5RtObb1cJ7VkACDq\n pKb9/DClgTKIm08lCfoilvi9Wl4SODbR1+1waHhiGmeZO8OdgLUCAwEAAQ==\nMIC-Info: RSA-MD5,RSA,\n jSme4OE5puXgBpdHHyga1WdDJ0E3trqOOdfp13QPWNizEt4YLMTbUPjitjQi47a9\n tBwulFatOU1F7uc/UNiQZQ==\n\n 0000912057-94-000263.txt : 19950608\n\n10-K\n 1\n 10-K\n\n- - - - --------------------------------------------------------------------------------\n- - - - --------------------------------------------------------------------------------\n\n                       SECURITIES AND EXCHANGE COMMISSION\n                             WASHINGTON, D.C. 20549\n\n                            ------------------------\n\n                                   Form 10-K\n(Mark One)\n   /X/           ANNUAL REPORT PURSUANT TO SECTION 13 OR 15 (D)\n             OF THE SECURITIES EXCHANGE ACT OF 1934 (FEE REQUIRED)\n                   FOR THE FISCAL YEAR ENDED AUGUST 31, 1993\n                                       OR\n\n   / /         TRANSITION REPORT PURSUANT TO SECTION 13 OR 15 (D)\n            OF THE SECURITIES EXCHANGE ACT OF 1934 (NO FEE REQUIRED)\n              FOR THE TRANSITION PERIOD FROM          TO\n\n                            COMMISSION FILE NUMBER)\n                                    1-10511\n                             ---------------------\n\n                        AMERICAN MEDICAL HOLDINGS, INC.\n             (Exact name of registrant as specified in its charter)\n\n             DELAWARE                                    13-3527632\n  (State or other jurisdiction of                     (I.R.S. Employer\n  incorporation or organization)                     Identification No.)\n\n                             Commission file number\n                                     1-7612\n                            ------------------------\n\n                      AMERICAN MEDICAL INTERNATIONAL, INC.\n             (Exact name of registrant as specified in its charter)\n\n            DELAWARE                                    95-2111054\n(State or other jurisdiction of            (I.R.S. Employer Identification No.)\n incorporation or organization)\n8201 Preston Road, Dallas, Texas                          75225\n(Address of principal executive                         (Zip Code)\n            offices)\n\n      (Registrants\' telephone number, including area code) (214) 360-6300\n                            ------------------------\n\n          Securities registered pursuant to Section 12(b) of the Act:\n                        American Medical Holdings, Inc.:\n\n(TITLE OF EACH CLASS)           (NAME OF EACH EXCHANGE ON WHICH REGISTERED)\n- - - - ----------------------          -------------------------------------------\n     COMMON STOCK                         NEW YORK STOCK EXCHANGE\n\n          Securities registered pursuant to Section 12(g) of the Act:\n                     American Medical International, Inc.:\n              8 1/4% Convertible Subordinated Debentures due 2008\n              9 1/2% Convertible Subordinated Debentures due 2001\n                                (Title of class)\n\n    Indicate  by check mark  whether the Registrants (1)  have filed all reports\nrequired to be filed by  Section 13 or 15(d) of  the Securities Exchange Act  of\n1934  during  the preceding  12  months (or  for  such shorter  period  that the\nRegistrants were required to  file such reports), and  (2) have been subject  to\nsuch  filing requirements for the past  90 days. American Medical Holdings, Inc.\n Yes _X_ No ____ . American Medical International, Inc.\nYes _X_ No ____ .\n\n    As of November  18, 1993 there  were 76,987,204 shares  of American  Medical\nHoldings,  Inc. Common Stock, $.01 par  value, outstanding. The aggregate market\nvalue of Common  Stock held by  non-affiliates of the  registrant, based on  the\nclosing   price  of  these  shares  at  November  18,  1993,  was  approximately\n$479,199,166. For the purposes of the foregoing calculation only, all  directors\nand  executive officers and  principal stockholders of  the registrant have been\ndeemed affiliates.\n\n    All  shares  of  Common   Stock,  $.01  par   value,  of  American   Medical\nInternational, Inc. are held by American Medical Holdings, Inc.\n\n                      DOCUMENTS INCORPORATED BY REFERENCE\nAmerican Medical Holdings, Inc.\'s definitive proxy statement for its 1994 Annual\n                      Meeting of Stockholders....Part III\n\n- - - - --------------------------------------------------------------------------------\n- - - - --------------------------------------------------------------------------------\n \n                                     INDEX\n\nPAGE\n                                                                                  REFERENCE\n                                                                                  ---------\n                                                                               \n                                          PART I\nItem 1.   Business..............................................................        1\nItem 2.   Properties............................................................       12\nItem 3.   Legal Proceedings.....................................................       12\nItem 4.   Submission of Matters to a Vote of Security Holders...................       12\n                                          PART II\nItem 5.   Market for the Registrant\'s Common Stock and Related Stockholder\n          Matters...............................................................       14\nItem 6.   Selected Financial Data...............................................       15\nItem 7.   Management\'s Discussion and Analysis of Financial Condition and\n          Results of Operations.................................................       16\nItem 8.   Financial Statements and Supplementary Data...........................       23\nItem 9.   Changes in and Disagreements with Accountants on Accounting and\n          Financial Disclosure..................................................       23\n                                         PART III\nItem 10.  Directors and Executive Officers of the Registrants...................       23\nItem 11.  Executive Compensation................................................       23\nItem 12.  Security Ownership of Certain Beneficial Owners and Management........       23\nItem 13.  Certain Relationships and Related Transactions........................       23\n                                          PART IV\nItem 14.  Exhibits, Financial Statement Schedules and Reports on Form 8-K.......       23\n\nPART I\n\nITEM 1. BUSINESS\n\n    GENERAL\n\n    American  Medical Holdings, Inc. ("Holdings") was organized in July, 1989 to\nacquire American Medical International, Inc. ("AMI" and, together with Holdings,\nthe "Company"). As a result of this acquisition, Holdings is the owner of all of\nthe outstanding shares of common stock of AMI.\n\n    The Company  is one  of the  leading hospital  management companies  in  the\nUnited  States.  Generally,  the Company\'s  hospitals  provide a  full  range of\ninpatient and  outpatient  services including  medical/surgical,  obstetric  and\ndiagnostic  services and  services provided  by intensive  care units, emergency\nrooms,  laboratories  and  pharmacies.  The  Company  also  operates   ancillary\nfacilities  at each of its hospitals, such as ambulatory, occupational and rural\nhealthcare clinics. At August 31, 1993,  the Company operated 35 domestic  acute\ncare hospitals and one psychiatric hospital containing a total of 8,003 licensed\nbeds.  The Company\'s hospitals  are principally located in  the suburbs of major\nmetropolitan areas in 12 states including Texas, Florida and California. Through\nbroad networks including  health maintenance  organizations, preferred  provider\norganizations,  insurers  and  employers,  the  Company  provides  high quality,\naffordable  health  services  while  facing  the  challenge  of  containing  the\ncontinually rising healthcare costs.\n\n    Management  expects that the  Company\'s ongoing control  of costs emphasized\nduring fiscal  1993 will  provide the  Company a  competitive edge  to  increase\nmarket  share notwithstanding  the presence  of a  managed care  environment. In\nresponse to the  ever-changing healthcare  system, the  shift toward  outpatient\nservices,  the need  to reduce  provider costs  for acute-care  services and the\nClinton Administration\'s desire to provide  universal access to healthcare,  the\nCompany  is developing  physician networks  and alliances  with other healthcare\nproviders to create fully integrated healthcare delivery systems.\n\n    Holdings and AMI are Delaware corporations with principal executive  offices\nlocated  at  8201  Preston  Road,  Suite  300,  P.O.  Box  25651,  Dallas, Texas\n75225-5651. The telephone number for Holdings  and AMI at such address is  (214)\n360-6300. AMI was incorporated in 1957.\n\n    PROPERTIES\n\n    The  Company  owns  or  leases  and operates  the  following  35  acute care\nhospitals and one psychiatric hospital.\n\n1\n\nThe Company  also  owns or  manages  medical office  buildings  and  related\nhealthcare  facilities associated  with 31 of  its hospitals as  well as certain\nundeveloped properties.\n\n                                       2\n \n    EMPLOYEES\n\n    As of August 31,  1993, the Company had  approximately 28,200 employees,  of\nwhich approximately 66% were full time employees. Two of the Company\'s hospitals\nhad  labor  contracts  covering  approximately 5%  of  the  Company\'s employees.\nManagement  believes  that  its  relations  with  its  employees  generally  are\nsatisfactory.\n\n    MEDICAL STAFFS\n\n    The  medical  staff  at  each hospital  generally  consists  of non-employee\nphysicians. There  is a  trend in  the healthcare  industry in  some regions  to\nemploy  physicians and where  appropriate, the Company\'s  hospitals have pursued\nthis option. Medical staff members of  the Company-owned hospitals that are  not\nemployees usually also serve on the medical staffs of hospitals not owned by the\nCompany  and may terminate their  relationships with the Company-owned hospitals\nat any time.\n\n    Rules and  regulations concerning  the medical  aspects of  each  hospital\'s\noperations  are  adopted  and enforced  by  its  medical staff.  Such  rules and\nregulations provide that the members of  the staff elect officers who,  together\nwith  additional physicians selected by them, supervise all medical and surgical\nprocedures and services. Their supervision  is subject to the general  oversight\nof the hospital\'s Governing Board.\n\n    QUALITY OF SERVICES\n\n    Management  believes the quality of healthcare services is critical in order\nto attract  and retain  top physicians  and  increase the  market share  of  the\nCompany\'s  hospitals. One of the  key mechanisms used to  monitor the quality of\ncare at  the Company\'s  hospitals is  a quality  assurance program  designed  to\nmeasure   patient  satisfaction,  the  Patient  Satisfaction  Monitoring  System\n("PSMS"). PSMS utilizes the  results of interviews  performed by an  independent\nresearch  company  of  a  statistically determined  sample  group  of discharged\npatients at each  hospital to  gather patient responses  regarding the  hospital\nservices  provided. Management uses the results as a tool to improve the quality\nof patient services and satisfaction and believes PSMS has assisted the  Company\nin successfully maintaining and improving the quality of healthcare as perceived\nby  patients  and  their physicians  and  thereby contributing  to  improved net\nrevenues. PSMS  is also  used by  the Company  as one  of the  bases upon  which\nhospital  executive  directors and  other  employees are  compensated  under the\nCompany\'s incentive compensation program.  Management believes that the  Company\nwas  the first in the industry to directly tie compensation to the attainment of\nqualitative performance targets.\n\n    The Company  has  recently developed  a  system  similar to  PSMS  which  is\ndesigned  to measure physician satisfaction, the MD Satisfaction Survey. A pilot\nprogram for this  survey has been  implemented at one  hospital and the  Company\nplans  to make it available for use at each of the Company\'s hospitals in fiscal\n1994.\n\n    COMPETITION\n\n    Generally, other  investor-owned and  non-profit  hospitals operate  in  the\nlocal  markets in which  the Company participates and  provide services that are\nsimilar to those offered by the Company\'s hospitals. Competition among hospitals\nand other healthcare  providers in  the United  States has  increased in  recent\nyears  due to a decline  in occupancy rates resulting  from, among other things,\nchanges in  government  regulation  and reimbursement,  other  cost  containment\npressures, technology, and most recently, the healthcare reform plan proposed by\nthe Clinton Administration. Additionally, hospitals owned by government agencies\nor  other  tax-exempt  entities  benefit  from  advantages  such  as endowments,\ncharitable contributions  and tax-exempt  financing,  which advantages  are  not\navailable to the Company\'s hospitals.\n\n    Management  believes  that a  hospital\'s  competitive position  within local\nmarkets is  affected by  various  factors including  the quality  of  healthcare\nservices  provided, pricing of healthcare  services, the hospital\'s location and\nthe types of services offered. The Company expects to improve the performance of\nits  hospitals  by  (i)   expanding  physician  network  relationships   thereby\nattracting and retaining\n\n                                       3\n \nquality physician and medical personnel, (ii) increasing its emphasis on managed\ncare  contracting,  (iii)  developing  and  marketing  new  healthcare  services\ntargeted to the particular needs of the communities served by its hospitals, and\n(iv) expanding profitable outpatient services.\n\n    The competitive  position of  a  hospital is  increasingly affected  by  its\nability  to  negotiate  contracts  for  healthcare  services  with  managed care\norganizations, including  health maintenance  organizations ("HMOs"),  preferred\nprovider  organizations  ("PPOs")  and  other  purchasers  of  group  healthcare\nservices. HMOs and PPOs attempt to  direct and control use of hospital  services\nthrough  strict  utilization  management programs  and  by  negotiating provider\ncontracts with only one or  a limited number of  hospitals in each market  area.\nThe importance of negotiating with managed care organizations varies from market\nto  market  depending on  the  market strength  of  such organizations.  In some\nsituations, hospitals  have agreed  to fixed  payments based  on the  number  of\nmanaged  care  enrollees,  thereby  assuming  hospital  utilization  risk  (such\ncontracts are referred  to as capitated  contracts). Managed care  organizations\nare  generally  able  to  obtain discounts  from  hospital  established charges.\nManagement believes that the Company is able to compete effectively for  managed\ncare  business in part  because of its relationships  with local physicians, its\nhospital management teams, its attention to cost controls and quality of service\nand its  strategies to  establish  service niches  in  markets served  by  other\nhospitals.\n\n    Merger   and  acquisition  activity  has   significantly  increased  in  the\nhealthcare industry involving  both investor-owned and  non-profit entities.  As\nhealthcare   reforms  announced  by  the  Clinton  Administration  take  effect,\nmanagement believes that it will become  more important for hospitals and  other\nhealthcare  providers  to  work  together to  form  fully  integrated healthcare\ndelivery systems and  thereby provide  the community and  marketplace with  high\nquality, cost effective healthcare products and services. During fiscal 1993 the\nCompany  entered into an  agreement with HealthTrust,  Inc.-The Hospital Company\n("HealthTrust") to jointly  operate AMI\'s  Tarzana Regional  Medical Center  and\nHealthTrust\'s  Encino  Hospital.  Management  is  continually  evaluating  other\nsimilar opportunities  and acquisitions  to  expand the  networks in  which  the\nCompany currently participates.\n\n    SOURCES OF REVENUE\n\n    The  sources of the Company\'s  hospital revenues are room  and board and the\nprovision of ancillary  medical services.  Room and board  represents the  basic\ncharges for the hospital room and related services, such as general nursing care\nand  meals.  Ancillary medical  services represent  the  charges related  to the\nmedical support activities performed by  the hospital, such as X-rays,  physical\ntherapy  and laboratory procedures.  The Company receives  payments for services\nrendered to patients from the federal government under Medicare and the Civilian\nHealth and Medical  Program of  Uniformed Services  ("CHAMPUS") programs,  state\ngovernments under their respective Medicaid programs, managed care organizations\n("contracted  services"), private insurers,  self-insured employers and directly\nfrom patients. In addition to revenues received from such programs and patients,\nthe Company receives other  non-patient revenues (e.g.  cafeteria and gift  shop\nrevenues).  During fiscal 1991, the  Company also recognized revenues associated\nwith an HMO owned by the Company and divested in fiscal 1991.\n\n    The following table presents  the percentage of net  revenues for the  three\nyears ended August 31 under each of the following programs:\n\nThe  Company\'s hospital revenues received under Medicare, Medicaid, CHAMPUS,\nBlue Cross  and from  payors  of contracted  services  are generally  less  than\ncustomary  charges for the  services covered. Following  the initiative taken by\nthe federal government to control healthcare costs, other\n\n                                       4\n \nmajor purchasers  of  healthcare,  including  states,  insurance  companies  and\nemployers,  are increasingly negotiating the amounts  they will pay for services\nperformed  rather  than  simply  paying  healthcare  providers  their  customary\ncharges.  Managed  care  programs  which offer  prepaid  and  discounted medical\nservice packages are  capturing an increasing  share of the  market, tending  to\nreduce  the historical  rate of  growth of hospital  revenues. As  a result, new\nkinds of healthcare strategies and  provider networks (e.g. physician  networks)\nare continuing to emerge.\n\n    Patients  are generally not responsible for any difference between customary\nhospital charges and  amounts reimbursed under  Medicare, Medicaid, CHAMPUS  and\nsome  Blue Cross plans  or by payors  of contracted services  for such services,\nexcept to the extent of any exclusions, deductibles or co-insurance features  of\ntheir  coverage. In  recent years insurers  and other payors  have increased the\namount of such exclusions, deductibles and co-insurance generally increasing the\npatient\'s financial  responsibility  to  directly pay  for  some  services.  The\nincrease  in the  self-pay portion of  a patient\'s  financial responsibility may\nalso increase the Company\'s uncollectible accounts.\n'

import torch

tokenized_txt = tokenizer(txt, return_tensors = 'pt')

with torch.no_grad():
    output = model(tokenized_txt['input_ids'], tokenized_txt['attention_mask'])

Contact

For any comments, questions, or feedback, please get in touch with us via phanminhtri2611@gmail.com or triminh.phan@unisg.ch.

Paper

(updating)