Update README.md
Browse files
README.md
CHANGED
@@ -13,21 +13,6 @@ thumbnail: >-
|
|
13 |
https://cdn-uploads.huggingface.co/production/uploads/669ee023c7e62283cb5c51e0/MpLp6QMlriY25tezXwOYr.png
|
14 |
---
|
15 |
|
16 |
-
<div align="center">
|
17 |
-
<img src="https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/source/data/pixeltable-logo-large.png" alt="Pixeltable" width="50%" />
|
18 |
-
<br></br>
|
19 |
-
|
20 |
-
[![License](https://img.shields.io/badge/License-Apache%202.0-darkblue.svg)](https://opensource.org/licenses/Apache-2.0)
|
21 |
-
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pixeltable?logo=python&logoColor=white)
|
22 |
-
![Platform Support](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-8A2BE2)
|
23 |
-
<br>
|
24 |
-
[![tests status](https://github.com/pixeltable/pixeltable/actions/workflows/pytest.yml/badge.svg)](https://github.com/pixeltable/pixeltable/actions/workflows/pytest.yml)
|
25 |
-
[![tests status](https://github.com/pixeltable/pixeltable/actions/workflows/nightly.yml/badge.svg)](https://github.com/pixeltable/pixeltable/actions/workflows/nightly.yml)
|
26 |
-
[![PyPI Package](https://img.shields.io/pypi/v/pixeltable?color=darkorange)](https://pypi.org/project/pixeltable/)
|
27 |
-
|
28 |
-
[Installation](https://pixeltable.github.io/pixeltable/getting-started/) | [Documentation](https://pixeltable.readme.io/) | [API Reference](https://pixeltable.github.io/pixeltable/) | [Code Samples](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#-code-samples) | [Computer Vision](https://docs.pixeltable.com/docs/object-detection-in-videos) | [LLM](https://docs.pixeltable.com/docs/document-indexing-and-rag)
|
29 |
-
</div>
|
30 |
-
|
31 |
Pixeltable is a Python library providing a declarative interface for multimodal data (text, images, audio, video). It features built-in versioning, lineage tracking, and incremental updates, enabling users to **store**, **transform**, **index**, and **iterate** on data for their ML workflows.
|
32 |
|
33 |
Data transformations, model inference, and custom logic are embedded as **computed columns**.
|
@@ -44,164 +29,6 @@ pip install pixeltable
|
|
44 |
```
|
45 |
**Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.**
|
46 |
|
47 |
-
## 💡 Getting Started
|
48 |
-
Learn how to create tables, populate them with data, and enhance them with built-in or user-defined transformations.
|
49 |
-
|
50 |
-
| Topic | Notebook | Topic | Notebook |
|
51 |
-
|:----------|:-----------------|:-------------------------|:---------------------------------:|
|
52 |
-
| 10-Minute Tour of Pixeltable | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/tutorials/pixeltable-basics.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> | Tables and Data Operations | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/fundamentals/tables-and-data-operations.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>
|
53 |
-
| User-Defined Functions (UDFs) | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/howto/udfs-in-pixeltable.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> | Object Detection Models | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/tutorials/object-detection-in-videos.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>
|
54 |
-
| Experimenting with Chunking (RAG) | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/tutorials/rag-operations.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | Working with External Files | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/howto/working-with-external-files.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>
|
55 |
-
| Integrating with Label Studio | <a target="_blank" href="https://pixeltable.readme.io/docs/label-studio"> <img src="https://img.shields.io/badge/Docs-Label Studio-blue" alt="Visit our documentation"/></a> | Audio/Video Transcript Indexing | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/tutorials/audio-transcriptions.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>
|
56 |
-
|
57 |
-
## 🧱 Code Samples
|
58 |
-
|
59 |
-
### Import media data into Pixeltable (videos, images, audio...)
|
60 |
-
```python
|
61 |
-
import pixeltable as pxt
|
62 |
-
|
63 |
-
v = pxt.create_table('external_data.videos', {'video': pxt.VideoType()})
|
64 |
-
|
65 |
-
prefix = 's3://multimedia-commons/'
|
66 |
-
paths = [
|
67 |
-
'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
|
68 |
-
'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
|
69 |
-
'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'
|
70 |
-
]
|
71 |
-
v.insert({'video': prefix + p} for p in paths)
|
72 |
-
```
|
73 |
-
Learn how to [work with data in Pixeltable](https://pixeltable.readme.io/docs/working-with-external-files).
|
74 |
-
|
75 |
-
### Object detection in images using DETR model
|
76 |
-
```python
|
77 |
-
import pixeltable as pxt
|
78 |
-
from pixeltable.functions import huggingface
|
79 |
-
|
80 |
-
# Create a table to store data persistently
|
81 |
-
t = pxt.create_table('image', {'image': pxt.ImageType()})
|
82 |
-
|
83 |
-
# Insert some images
|
84 |
-
prefix = 'https://upload.wikimedia.org/wikipedia/commons'
|
85 |
-
paths = [
|
86 |
-
'/1/15/Cat_August_2010-4.jpg',
|
87 |
-
'/e/e1/Example_of_a_Dog.jpg',
|
88 |
-
'/thumb/b/bf/Bird_Diversity_2013.png/300px-Bird_Diversity_2013.png'
|
89 |
-
]
|
90 |
-
t.insert({'image': prefix + p} for p in paths)
|
91 |
-
|
92 |
-
# Add a computed column for image classification
|
93 |
-
t['classification'] = huggingface.detr_for_object_detection(
|
94 |
-
(t.image), model_id='facebook/detr-resnet-50'
|
95 |
-
)
|
96 |
-
|
97 |
-
# Retrieve the rows where cats have been identified
|
98 |
-
t.select(animal = t.image,
|
99 |
-
classification = t.classification.label_text[0]) \
|
100 |
-
.where(t.classification.label_text[0]=='cat').head()
|
101 |
-
```
|
102 |
-
Learn about computed columns and object detection: [Comparing object detection models](https://pixeltable.readme.io/docs/object-detection-in-videos).
|
103 |
-
|
104 |
-
### Extend Pixeltable's capabilities with user-defined functions
|
105 |
-
```python
|
106 |
-
@pxt.udf
|
107 |
-
def draw_boxes(img: PIL.Image.Image, boxes: list[list[float]]) -> PIL.Image.Image:
|
108 |
-
result = img.copy() # Create a copy of `img`
|
109 |
-
d = PIL.ImageDraw.Draw(result)
|
110 |
-
for box in boxes:
|
111 |
-
d.rectangle(box, width=3) # Draw bounding box rectangles on the copied image
|
112 |
-
return result
|
113 |
-
```
|
114 |
-
Learn more about user-defined functions: [UDFs in Pixeltable](https://pixeltable.readme.io/docs/user-defined-functions-udfs).
|
115 |
-
|
116 |
-
### Automate data operations with views, e.g., split documents into chunks
|
117 |
-
```python
|
118 |
-
# In this example, the view is defined by iteration over the chunks of a DocumentSplitter
|
119 |
-
chunks_table = pxt.create_view(
|
120 |
-
'rag_demo.chunks',
|
121 |
-
documents_table,
|
122 |
-
iterator=DocumentSplitter.create(
|
123 |
-
document=documents_table.document,
|
124 |
-
separators='token_limit', limit=300)
|
125 |
-
)
|
126 |
-
```
|
127 |
-
Learn how to leverage views to build your [RAG workflow](https://pixeltable.readme.io/docs/document-indexing-and-rag).
|
128 |
-
|
129 |
-
### Evaluate model performance
|
130 |
-
```python
|
131 |
-
# The computation of the mAP metric can become a query over the evaluation output
|
132 |
-
frames_view.select(mean_ap(frames_view.eval_yolox_tiny), mean_ap(frames_view.eval_yolox_m)).show()
|
133 |
-
```
|
134 |
-
Learn how to leverage Pixeltable for [Model analytics](https://pixeltable.readme.io/docs/object-detection-in-videos).
|
135 |
-
|
136 |
-
### Working with inference services
|
137 |
-
```python
|
138 |
-
chat_table = pxt.create_table('together_demo.chat', {'input': pxt.StringType()})
|
139 |
-
|
140 |
-
# The chat-completions API expects JSON-formatted input:
|
141 |
-
messages = [{'role': 'user', 'content': chat_table.input}]
|
142 |
-
|
143 |
-
# This example shows how additional parameters from the Together API can be used in Pixeltable
|
144 |
-
chat_table['output'] = chat_completions(
|
145 |
-
messages=messages,
|
146 |
-
model='mistralai/Mixtral-8x7B-Instruct-v0.1',
|
147 |
-
max_tokens=300,
|
148 |
-
stop=['\n'],
|
149 |
-
temperature=0.7,
|
150 |
-
top_p=0.9,
|
151 |
-
top_k=40,
|
152 |
-
repetition_penalty=1.1,
|
153 |
-
logprobs=1,
|
154 |
-
echo=True
|
155 |
-
)
|
156 |
-
chat_table['response'] = chat_table.output.choices[0].message.content
|
157 |
-
|
158 |
-
# Start a conversation
|
159 |
-
chat_table.insert([
|
160 |
-
{'input': 'How many species of felids have been classified?'},
|
161 |
-
{'input': 'Can you make me a coffee?'}
|
162 |
-
])
|
163 |
-
chat_table.select(chat_table.input, chat_table.response).head()
|
164 |
-
```
|
165 |
-
Learn how to interact with inference services such as [Together AI](https://pixeltable.readme.io/docs/together-ai) in Pixeltable.
|
166 |
-
|
167 |
-
### Text and image similarity search on video frames with embedding indexes
|
168 |
-
```python
|
169 |
-
import pixeltable as pxt
|
170 |
-
from pixeltable.functions.huggingface import clip_image, clip_text
|
171 |
-
from pixeltable.iterators import FrameIterator
|
172 |
-
import PIL.Image
|
173 |
-
|
174 |
-
video_table = pxt.create_table('videos', {'video': pxt.VideoType()})
|
175 |
-
|
176 |
-
video_table.insert([{'video': '/video.mp4'}])
|
177 |
-
|
178 |
-
frames_view = pxt.create_view(
|
179 |
-
'frames', video_table, iterator=FrameIterator.create(video=video_table.video))
|
180 |
-
|
181 |
-
@pxt.expr_udf
|
182 |
-
def embed_image(img: PIL.Image.Image):
|
183 |
-
return clip_image(img, model_id='openai/clip-vit-base-patch32')
|
184 |
-
|
185 |
-
@pxt.expr_udf
|
186 |
-
def str_embed(s: str):
|
187 |
-
return clip_text(s, model_id='openai/clip-vit-base-patch32')
|
188 |
-
|
189 |
-
# Create an index on the 'frame' column that allows text and image search
|
190 |
-
frames_view.add_embedding_index('frame', string_embed=str_embed, image_embed=embed_image)
|
191 |
-
|
192 |
-
# Now we will retrieve images based on a sample image
|
193 |
-
sample_image = '/image.jpeg'
|
194 |
-
sim = frames_view.frame.similarity(sample_image)
|
195 |
-
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()
|
196 |
-
|
197 |
-
# Now we will retrieve images based on a string
|
198 |
-
sample_text = 'red truck'
|
199 |
-
sim = frames_view.frame.similarity(sample_text)
|
200 |
-
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()
|
201 |
-
|
202 |
-
```
|
203 |
-
Learn how to work with [Embedding and Vector Indexes](https://docs.pixeltable.com/docs/embedding-vector-indexes).
|
204 |
-
|
205 |
## ❓ FAQ
|
206 |
|
207 |
### What is Pixeltable?
|
@@ -236,16 +63,4 @@ Today's solutions for AI app development require extensive custom coding and inf
|
|
236 |
### What is Pixeltable not providing?
|
237 |
|
238 |
- Pixeltable is not a low-code, prescriptive AI solution. We empower you to use the best frameworks and techniques for your specific needs.
|
239 |
-
- We do not aim to replace your existing AI toolkit, but rather enhance it by streamlining the underlying data infrastructure and orchestration.
|
240 |
-
|
241 |
-
> [!TIP]
|
242 |
-
> Check out the [Integrations](https://pixeltable.readme.io/docs/working-with-openai) section, and feel free to submit a request for additional ones.
|
243 |
-
|
244 |
-
## 🐛 Contributions & Feedback
|
245 |
-
|
246 |
-
Are you experiencing issues or bugs with Pixeltable? File an [Issue](https://github.com/pixeltable/pixeltable/issues).
|
247 |
-
</br>Do you want to contribute? Feel free to open a [PR](https://github.com/pixeltable/pixeltable/pulls).
|
248 |
-
|
249 |
-
## :classical_building: License
|
250 |
-
|
251 |
-
This library is licensed under the Apache 2.0 License.
|
|
|
13 |
https://cdn-uploads.huggingface.co/production/uploads/669ee023c7e62283cb5c51e0/MpLp6QMlriY25tezXwOYr.png
|
14 |
---
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
Pixeltable is a Python library providing a declarative interface for multimodal data (text, images, audio, video). It features built-in versioning, lineage tracking, and incremental updates, enabling users to **store**, **transform**, **index**, and **iterate** on data for their ML workflows.
|
17 |
|
18 |
Data transformations, model inference, and custom logic are embedded as **computed columns**.
|
|
|
29 |
```
|
30 |
**Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.**
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## ❓ FAQ
|
33 |
|
34 |
### What is Pixeltable?
|
|
|
63 |
### What is Pixeltable not providing?
|
64 |
|
65 |
- Pixeltable is not a low-code, prescriptive AI solution. We empower you to use the best frameworks and techniques for your specific needs.
|
66 |
+
- We do not aim to replace your existing AI toolkit, but rather enhance it by streamlining the underlying data infrastructure and orchestration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|