## IPC Semantic Search

- Perform Semantic Search on the IPC
- Use txtai package, and LegalBERT model

### Import

In [7]:
# !pip install txtai

In [8]:
# import the packages
import pandas as pd ## for better visualization of result
from txtai.embeddings import Embeddings
import pandas as pd

### Load model

In [60]:
# Create embeddings model, backed by sentence-transformers & transformers
# embeddings = Embeddings({"path": "nlpaueb/legal-bert-small-uncased"})
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})

Downloading:   0%|          | 0.00/587 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.16k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

### Load data and create index

In [14]:
# read the data
df = pd.read_csv("/content/sections_desc.csv")
# display sample
display(df.head())

Unnamed: 0,title,link,section,description
0,IPC Section 1 » Title and extent of operation ...,http://devgan.in/ipc/section/1/,Section 1,This Act shall be called the Indian Penal Code...
1,"IPC Section 33 » ""Act"". ""Omission"".",http://devgan.in/ipc/section/33/,Section 33,The word “act” denotes as well as series of ac...
2,IPC Section 38 » Persons concerned in criminal...,http://devgan.in/ipc/section/38/,Section 38,Where several persons are engaged or concerned...
3,IPC Section 37 » Co-operation by doing one of ...,http://devgan.in/ipc/section/37/,Section 37,When an offence is committed by means of sever...
4,IPC Section 32 » Words referring to acts inclu...,http://devgan.in/ipc/section/32/,Section 32,"In every part of this Code, except where a con..."


In [61]:
print("creating index...")
# index the data
embeddings.index([(uid, str(text), None) for uid, text in enumerate(df['description'].tolist())]) 
# df['description'].tolist()

creating index...


### Perform Search

In [66]:
# now instead of similarity, use search function
query = "not of sound mind"
print(f"Most similar sections to '{query}' are:\n----------")
for id, score in embeddings.search(query, limit=5):
  print(f"ID: {id}\nSection: {df.loc[id, 'section']}\nDescription: {df.loc[id, 'description']}\n----------\n")

Most similar sections to 'not of sound mind' are:
----------
ID: 497
Section: Section 90
Description: A consent is not such a consent as is intended by any section of this Code, if the consent is given by a person under fear of injury, or under a misconception of fact, and if the person doing the act knows, or has reason to believe, that the consent was given in consequence of such fear or misconception; or 
Consent of insane personif the consent is given by a person who, from unsoundness of mind, or intoxication, is unable to understand the nature and consequence of that to which he gives his consent; or 
Consent of child unless the contrary appears from the context, if the consent is given by a person who is under twelve years of age.
----------

ID: 504
Section: Section 84
Description: Nothing is an offence which is done by a person who, at the time of doing it, by reason of unsoundness of mind, is incapable of knowing the nature of the act, or that he is doing what is either wrong 

In [38]:
df.loc[563, ]

title                 IPC Section 13 » "Queen".
link           http://devgan.in/ipc/section/13/
section                              Section 13
description        [Repealed by the A. O. 1950]
Name: 563, dtype: object

In [68]:
# df.loc[df['description'].str.contains("Repealed").index, 'description']
embeddings.save("/content/ipc_index")

In [69]:
embeddings.load("/content/ipc_index")