Spaces:

yiquntchen
/

TWIGMA

Runtime error

TWIGMA / details.py

Yiqun Chen

Add .gitattributes

d1a5e77 about 2 years ago

4.63 kB

	from pathlib import Path
	import streamlit as st
	import streamlit.components.v1 as components
	from PIL import Image
	import base64
	import pandas as pd

	def read_markdown_file(markdown_file):
	return Path(markdown_file).read_text()

	def render_svg(svg_filename):
	with open(svg_filename,"r") as f:
	lines = f.readlines()
	svg=''.join(lines)
	"""Renders the given svg string."""
	b64 = base64.b64encode(svg.encode('utf-8')).decode("utf-8")
	html = r'<img src="data:image/svg+xml;base64,%s"/>' % b64
	st.write(html, unsafe_allow_html=True)


	def app():
	#intro_markdown = read_markdown_file("introduction.md")
	#st.markdown(intro_markdown, unsafe_allow_html=True)

	st.markdown("## TWIGMA (TWItter Generative-ai images with MetadatA)")
	st.markdown("### A dataset to understand content, variation, and longitudinal theme shifts of AI-generated images on Twitter.")

	render_svg("resources/SVG/Asset 49.svg")

	st.markdown("This is a website for TWIGMA (TWItter Generative-ai images with MetadatA), a comprehensive dataset encompassing 800,000 gen-AI images collected from Jan 2021 to March 2023 on Twitter, with associated metadata (e.g., tweet text, creation date, number of likes). Through a comparative analysis of TWIGMA with natural images and human artwork, we find that gen-AI images possess distinctive characteristics and exhibit, on average, lower variability when compared to their non-gen-AI counterparts. Additionally, we find that the similarity between a gen-AI image and natural images (i) is inversely correlated with the number of likes; and (ii) can be used to identify human images that served as inspiration for the gen-AI creations. Finally, we observe a longitudinal shift in the themes of AI-generated images on Twitter, with users increasingly sharing artistically sophisticated content such as intricate human portraits, whereas their interest in simple subjects such as natural scenes and animals has decreased. Our analyses and findings underscore the significance of TWIGMA as a unique data resource for studying AI-generated images.", unsafe_allow_html=True)

	st.markdown('#### Watch our successful image-to-image retrieval via PLIP:')
	col1, col2, col3, _, _ = st.columns([1, 1, 1, 1, 1])
	with col1:
	st.markdown("[Similar cells](https://twitter.com/ZhiHuangPhD/status/1641906064823312384)")
	example1 = Image.open('resources/example/1.png')
	st.image(example1, caption='Example 1', output_format='png')
	with col2:
	st.markdown("[Salient object](https://twitter.com/ZhiHuangPhD/status/1641899092195565569)")
	example2 = Image.open('resources/example/2.png')
	st.image(example2, caption='Example 2', output_format='png')
	with col3:
	st.markdown("[Similar region](https://twitter.com/ZhiHuangPhD/status/1641911235288645632)")
	example3 = Image.open('resources/example/3.png')
	st.image(example3, caption='Example 3', output_format='png')


	st.markdown("#### PLIP is trained on the largest public vision–language pathology dataset: OpenPath")

	col1, col2 = st.columns([1, 1])
	with col1:
	st.markdown("Following the usage policy and guidelines from Twitter and other entities, we established so far the largest public vision–language pathology dataset. To ensure the quality of the data, OpenPath followed rigorous protocols for cohort inclusion and exclusion, including the removal of retweets, sensitive tweets, and non-pathology images, as well as text cleaning.", unsafe_allow_html=True)
	st.markdown("The final OpenPath dataset consists of:", unsafe_allow_html=True)
	st.markdown("- Tweets: 116,504 image–text pairs from Twitter posts (tweets) during Mar. 21, 2006 – Nov. 15, 2022 across 32 pathology subspecialty-specific hashtags;", unsafe_allow_html=True)
	with col2:
	render_svg("resources/SVG/Asset 50.svg")
	render_svg("resources/SVG/Asset 51.svg")



	st.markdown("#### PLIP is trained with connecting the image and text via contrastive learning")

	col1, col2 = st.columns([3, 1])
	with col1:
	st.markdown("The proposed PLIP model generates two embedding vectors from both the text and image encoders. These vectors were then forced to be similar for each of the paired image and text vectors and dissimilar for non-paired image and text pairs via contrastive learning.", unsafe_allow_html=True)
	fig1e = Image.open('resources/4x/Fig1e.png')
	st.image(fig1e, caption='PLIP training', output_format='png')

	with col2:
	render_svg("resources/SVG/Asset 53.svg")