ali6parmak commited on
Commit
9335bf9
1 Parent(s): 6d4db20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -172,6 +172,28 @@ we process them after sorting all segments with content. To determine their read
172
  using distance as a criterion.
173
 
174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  ## Benchmark
176
 
177
  These are the benchmark results for VGT model on PubLayNet dataset:
 
172
  using distance as a criterion.
173
 
174
 
175
+ ### Extracting Tables and Formulas
176
+
177
+ Our service provides a way to extract your tables and formulas in different formats.
178
+
179
+ As default, formula segments' "text" property will include the formula in LaTeX format.
180
+
181
+ You can also extract tables in different formats like "markdown", "latex", or "html" but this is not a default option.
182
+ To extract the tables like this, you should set "extraction_format" parameter. Some example usages shown below:
183
+
184
+ ```
185
+ curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060 -F "extraction_format=latex"
186
+ ```
187
+ ```
188
+ curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060/fast -F "extraction_format=markdown"
189
+ ```
190
+
191
+ You should be aware that this additional extraction process can make the process much longer, especially if you have a large number of tables.
192
+
193
+ (For table extraction, we are using [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)
194
+ and for formula extraction, we are using [RapidLaTeXOCR](https://github.com/RapidAI/RapidLaTeXOCR))
195
+
196
+
197
  ## Benchmark
198
 
199
  These are the benchmark results for VGT model on PubLayNet dataset: