File size: 2,764 Bytes
e2e8616
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
from src.model.container import Container
from src.tools.index_creation import set_indexes
from src.tools.reader_word import WordReader
from src.tools.readers_pdf import Reader, Reader_illumio
from src.tools.reader_html import Reader_HTML
from src.model.paragraph import Paragraph


class Doc:

    def __init__(self, path='', include_images=True, actual_first_page=1):

        self.title = self.get_title(path)
        self.extension = self.title.split('.')[-1]
        self.id_ = id(self)
        self.path = path
        paragraphs = []
        if self.extension == 'docx':
            paragraphs = WordReader(path).paragraphs
        elif self.extension == 'pdf':
            if "Illumio_Core_REST_API_Developer_Guide_23.3" in self.title:
                paragraphs = Reader_illumio(path).paragraphs
            else:
                paragraphs = Reader(path, actual_first_page, include_images).paragraphs
        else:
            paragraphs = Reader_HTML(path).paragraphs
        self.container = Container(paragraphs, father=self, title=self.set_first_container_title(self.title.split(".")[0],self.extension))
        set_indexes(self.container)
        self.blocks = self.get_blocks()


    def get_title(self,path) -> str:
        if '/' not in path and '\\' not in path:
            res = path
        if '/' in path:
            res = path.split('/')[-1]
        if '\\' in path:
            res = path.split('\\')[-1]
        return res 

    @property
    def structure(self):
        return self.container.structure

    def get_blocks(self):

        def from_list_to_str(index_list):
            index_str = str(index_list[0])
            for el in index_list[1:]:
                index_str += '.' + str(el)
            return index_str

        blocks = self.container.blocks
        for block in blocks:
            block.doc = self.title
            block.index = from_list_to_str(block.index)
        return blocks
    
    def set_first_container_title(self,title,extension) -> Paragraph:
        if extension == 'pdf':
            return Paragraph(text=title,font_style='title0',id_=0,page_id=0)
        elif extension == 'docx':
            return Paragraph(text=title,font_style='title0',id_=0,page_id=1)
        else:
            return Paragraph(text=title,font_style='h0',id_=0,page_id=1)
"""
    current_level = len(current_index)
    if 0 < block.level:
        if block.level == current_level:
            current_index[-1] += 1
        elif current_level < block.level:
            current_index.append(1)
        elif block.level < current_level:
            current_index = current_index[:block.level]
            current_index[-1] += 1
        block.index = from_list_to_str(current_index)
    else:
        block.index = "0"
"""