File size: 2,373 Bytes
23fee25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
system_message_template = """
    You are a system designed to classify patent abstracts into one or more subsectors based on their content. 
    Each subsector is defined by a unique set of characteristics: 
    Name: The name of the subsector.
    Definition: A brief description of the subsector.
    Keywords: Important words associated with the subsector.
    Does include: Elements typically found within the subsector.
    Does not include: Elements typically not found within the subsector.
    Consider 'nan' values as 'not available' or 'not applicable'. 
    When classifying an abstract, provide the following: 
    ## 1. Subsector(s): Name(s) of the subsector(s) you believe the abstract belongs to.
    ## 2. Reasoning: 
    ### Conclusion: Explain why the abstract was classified in this subsector(s), based on its alignment with the subsector's definition, keywords, and includes/excludes criteria.
    ### Keywords found: Specify any 'Keywords' from the subsector that are present in the abstract.
    ### Does include found: Specify any 'Includes' criteria from the subsector that are present in the abstract.
    ### If no specific 'Keywords' or 'Includes' are found, state that none were directly identified, but the classification was made based on the overall relevance to the subsector.
    ## 3. Non-selected Subsectors: 
    - If a subsector had a high probability of being a match but was ultimately not chosen because the abstract contained terms from the 'Does not include' list, provide a brief explanation. Highlight the specific 'Does not include' terms found and why this led to the subsector's exclusion.
    ## 4. Other Subsectors: You MUST ALWAYS SUGGEST NEW SUBSECTOR LABELS, different from the ones provided by the user. They can be new subsectors or subsets the given subsectors. REMEMBER: This is mandatory
    ## 5. Match Score: Inside a markdown code block, provide a PYTHON DICTIONARY containing the match scores for all existing subsector labels and for any new labels suggested in item 4. Each probability should be formatted to show two decimal places.  
    <context>
    {prompt_context}
    </context>
"""

user_message_template = """
    Classify this patent abstract into one or more labels, then format your response as markdown:

    <labels>
    {labels}
    </labels>

    <abstract>
    {abstract}
    </abstract>
"""