H i LLaVA-llama3.1

: Hancom InSpace Large Language and Vision Assistant

Model Information

We are pleased to introduce our latest model, which builds upon the robust LLaVA architecture. This model has been meticulously fine-tuned by integrating two powerful components: the "openai/clip-vit-large-patch14-336" vision model and the "meta-llama/Meta-Llama-3.1-8B-Instruct" language model.

To ensure optimal performance, the model underwent a comprehensive pretraining process, utilizing 4 H100 GPUs over a period of 4 hours. Following this, a fine-tuning phase was conducted, which required an additional 11 hours of computational time.

This carefully crafted model is designed to deliver high-quality results across a variety of tasks, demonstrating the potential of advanced AI and machine learning techniques. We are excited to make this model available to the community and look forward to seeing the innovative applications that will emerge from its use.

Alt text

developer: Hancom InSpace

Supported languages: Korean, English

Model Release Date: August 30th, 2024.

Dataset

Translate to Korean

This model leverages the LLaVA architecture and has been fine-tuned using the "openai/clip-vit-large-patch14-336" vision model and "meta-llama/Meta-Llama-3.1-8B-Instruct" language model. The pretraining process was executed over 4 hours using 4 H100 GPUs, followed by an 11-hour fine-tuning phase. This model is designed to deliver high performance in multimodal tasks, particularly those involving vision and language.

The datasets used in the training process include the LLaVA Visual Instruct Pretrain LCS-558K and LLaVA Visual Instruct 150K datasets. These datasets contribute to the model's ability to align visual features with instruction-following tasks, pushing toward GPT-4 level multimodal capabilities.

Licensing Information When using or distributing this model, it is important to adhere to the following licenses:

LLaVA Visual Instruct Pretrain LCS-558K:

The dataset includes images and captions from the LAION/CC/SBU dataset and BLIP-generated captions. Users must comply with the licenses associated with CC-3M and BLIP. The CC-3M dataset may be freely used with proper attribution to Google LLC. However, the dataset is provided "AS IS" without any warranties. LLaVA Visual Instruct 150K:

This dataset is released under the Creative Commons Attribution 4.0 International License. Additionally, users must comply with OpenAI's Terms of Use as the dataset was generated using the GPT-4 API. It is crucial to include these licensing details when distributing or using the model to ensure compliance with the original data sources' terms.

Example Output

Sample 01 Sample 02 Sample 03 Sample 04
# Image1
Q: 해당 이미지를 설명해주세요.
HiLLaVA: 이 이미지에는 풀밭을 가로질러 걷고 있는 흑백 고양이 두 마리가 등장합니다. 한 마리는 왼쪽에, 다른 한 마리는 오른쪽에 있습니다. 두 고양이는 서로 가까이 붙어 있으며, 한 마리는 다른 고양이보다 약간 앞서 걷고 있는 것처럼 보입니다. 고양이 중 한 마리가 다른 고양이보다 약간 작아 보이는데, 이는 나이 차이를 나타내는 것일 수 있습니다. 고양이들은 풀밭을 가로질러 함께 이동하면서 장난스럽게 상호작용하는 것처럼 보입니다.

# Image2
Q: 어떤 동물이 보이나요?
HiLLaVA: 이미지 속 동물은 호랑이입니다.

# Image3
Q: 해당 건축물은 무엇인가요?
HiLLaVA: 해당 건축물은 파리, 프랑스에 위치한 유명한 랜드마크인 에펠탑입니다.

# Image4
Q: 대형 상업용 비행기가 있는 영역의 바운딩 박스 좌표를 알려주세요:
HiLLaVA:이 이미지는 공항 활주로에서 이륙을 준비하는 대형 상업용 비행기의 모습을 보여줍니다. 비행기는 활주로 중앙 근처에 위치하여 이륙을 준비하고 있습니다. 활주로 주변에는 여러 대의 차량이 흩어져 있으며, 일부는 비행기와 가깝게, 다른 차량은 멀리 떨어져 있습니다. 이 차량들은 비행기가 이륙할 준비를 하는 동안 비행기를 지원하는 지상 서비스 차량일 가능성이 높습니다. 이 장면은 비행기가 이륙을 준비하는 공항의 분주한 분위기를 포착합니다.
Q: [0.44, 0.47, 0.55, 0.56]
Alt text

About us

Alt text

Introduction for Company and Open Source LLM Release We are excited to introduce Hancom InSpace, a pioneering company at the forefront of aerospace, aviation, and ground system technologies. Founded in 2012, our company has continuously evolved, becoming a key player in the space and drone industries. Headquartered in Daejeon, South Korea, Hancom InSpace specializes in delivering comprehensive satellite and drone-based solutions, including satellite ground stations, AI-driven image analysis, and fully autonomous drone operations.

Hancom InSpace is renowned for its innovative contributions to space exploration and satellite technology. We launched Sejong-1, South Korea’s first private Earth observation satellite, marking a significant milestone in the New Space era. Our advanced services range from real-time satellite image analysis to the development of cutting-edge software for satellite ground stations, all certified by the Korean government.

In the aviation sector, our expertise spans the design and manufacture of drones, integrated with proprietary control systems and AI capabilities. These drones serve a variety of purposes, from precision agriculture to surveillance, offering state-of-the-art technology solutions tailored to both civilian and military needs.

As we continue to innovate, we are thrilled to open-source our Language Model (LLM), allowing the global community to leverage and build upon our cutting-edge AI technology. This initiative reflects our commitment to collaboration and the advancement of AI research and development. We invite you to explore our open-source LLM and join us in pushing the boundaries of what technology can achieve.

For more information about our company and the services we offer, please visit our website at www.inspace.co.kr.

Citation

@misc{
      title={HiLLaVA-llama3.1: Hancom Inspace Large Language and Vision Assistant},
      author={JoChanho},
      url={https://github.com/InSpaceAI/HiLLaVA-llama3.1},
      year={2024}
}
Downloads last month
112
Safetensors
Model size
8.35B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for HancomInSpaceAI/HiLLaVA-llama_3_1-8b

Finetuned
(748)
this model

Datasets used to train HancomInSpaceAI/HiLLaVA-llama_3_1-8b