File size: 19,092 Bytes
50cc2d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
import streamlit as st

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

        .benchmark-table {

            width: 100%;

            border-collapse: collapse;

            margin-top: 20px;

        }

        .benchmark-table th, .benchmark-table td {

            border: 1px solid #ddd;

            padding: 8px;

            text-align: left;

        }

        .benchmark-table th {

            background-color: #4A90E2;

            color: white;

        }

    </style>

""", unsafe_allow_html=True)

# Main Title
st.markdown('<div class="main-title">Extract Aspects and Entities from Airline Questions (ATIS dataset)</div>', unsafe_allow_html=True)

# Description
st.markdown("""

<div class="section">

    <p><strong>Named Entity Recognition (NER)</strong> is a crucial NLP task that involves identifying and classifying key entities in text. In the context of airline questions, NER helps in extracting essential information such as flight details, dates, and locations, which can be used to automate responses and enhance user interaction.</p>

    <p>This app focuses on extracting entities from questions related to airline operations, utilizing the ATIS (Airline Travel Information System) dataset. This dataset includes diverse queries about flight schedules, fares, and other travel-related information.</p>

</div>

""", unsafe_allow_html=True)

# What is NER
st.markdown('<div class="sub-title">What is Named Entity Recognition (NER)?</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p><strong>Named Entity Recognition (NER)</strong> is a process in Natural Language Processing (NLP) that locates and classifies named entities into predefined categories such as person names, organizations, locations, dates, etc. For instance, in the sentence "Flight DL 108 departs from New York on August 1st", NER helps identify 'DL 108' as a flight number, 'New York' as a location, and 'August 1st' as a date.</p>

    <p>NER models are trained to understand the context and semantics of entities within text, enabling automated systems to recognize and categorize these entities accurately. This capability is essential for developing intelligent systems capable of processing and responding to user queries efficiently.</p>

</div>

""", unsafe_allow_html=True)

# Why We Use NER
st.markdown('<div class="sub-title">Why Use NER for Airline Data?</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>In the airline industry, customer queries often involve extracting specific information from unstructured text. NER helps in:</p>

    <ul>

        <li><strong>Automating Responses:</strong> By identifying key entities such as flight numbers, dates, and locations, NER can help in automatically generating accurate responses to customer inquiries.</li>

        <li><strong>Improving Customer Service:</strong> Faster and more accurate information retrieval leads to improved customer satisfaction.</li>

        <li><strong>Data Analysis:</strong> Extracted entities can be used for analyzing trends, patterns, and anomalies in customer queries.</li>

    </ul>

</div>

""", unsafe_allow_html=True)

# Model Details
st.markdown('<div class="sub-title">About the Model</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>The <strong>nerdl_atis_840b_300d</strong> used in this app is a pre-trained model specifically designed for recognizing airline-related entities. This model is part of the Spark NLP library and has been trained on the ATIS dataset to identify and classify entities relevant to airline operations.</p>

    <p>The model includes entities like flight numbers, airport codes, dates, and more, providing a comprehensive tool for processing airline-related queries.</p>

</div>

""", unsafe_allow_html=True)
st.write("")

# Predicted Entities
with st.expander("Predicted Entities 80+"):
    st.markdown("""

        <ul>

            <li><strong>aircraft_code:</strong> Code for the aircraft.</li>

            <li><strong>airline_code:</strong> Code for the airline.</li>

            <li><strong>airline_name:</strong> Name of the airline.</li>

            <li><strong>airport_code:</strong> Code for the airport.</li>

            <li><strong>airport_name:</strong> Name of the airport.</li>

            <li><strong>arrive_date.date_relative:</strong> Relative date of arrival.</li>

            <li><strong>arrive_date.day_name:</strong> Name of the arrival day.</li>

            <li><strong>arrive_date.day_number:</strong> Day number of the arrival date.</li>

            <li><strong>arrive_date.month_name:</strong> Name of the arrival month.</li>

            <li><strong>arrive_date.today_relative:</strong> Arrival date relative to today.</li>

            <li><strong>arrive_time.end_time:</strong> End time of arrival.</li>

            <li><strong>arrive_time.period_mod:</strong> Modifier for arrival period.</li>

            <li><strong>arrive_time.period_of_day:</strong> Period of the day for arrival.</li>

            <li><strong>arrive_time.start_time:</strong> Start time of arrival.</li>

            <li><strong>arrive_time.time:</strong> Arrival time.</li>

            <li><strong>arrive_time.time_relative:</strong> Arrival time relative to another time.</li>

            <li><strong>city_name:</strong> Name of the city.</li>

            <li><strong>class_type:</strong> Type of class.</li>

            <li><strong>connect:</strong> Connection information.</li>

            <li><strong>cost_relative:</strong> Cost relative to something else.</li>

            <li><strong>day_name:</strong> Name of the day.</li>

            <li><strong>day_number:</strong> Number of the day.</li>

            <li><strong>days_code:</strong> Code for the days.</li>

            <li><strong>depart_date.date_relative:</strong> Departure date relative to another date.</li>

            <li><strong>depart_date.day_name:</strong> Name of the departure day.</li>

            <li><strong>depart_date.day_number:</strong> Number of the departure day.</li>

            <li><strong>depart_date.month_name:</strong> Name of the departure month.</li>

            <li><strong>depart_date.today_relative:</strong> Departure date relative to today.</li>

            <li><strong>depart_date.year:</strong> Year of departure.</li>

            <li><strong>depart_time.end_time:</strong> End time of departure.</li>

            <li><strong>depart_time.period_mod:</strong> Modifier for departure period.</li>

            <li><strong>depart_time.period_of_day:</strong> Period of the day for departure.</li>

            <li><strong>depart_time.start_time:</strong> Start time of departure.</li>

            <li><strong>depart_time.time:</strong> Departure time.</li>

            <li><strong>depart_time.time_relative:</strong> Departure time relative to another time.</li>

            <li><strong>economy:</strong> Economy class.</li>

            <li><strong>fare_amount:</strong> Amount of the fare.</li>

            <li><strong>fare_basis_code:</strong> Fare basis code.</li>

            <li><strong>flight_days:</strong> Days of the flight.</li>

            <li><strong>flight_mod:</strong> Modifier for the flight.</li>

            <li><strong>flight_number:</strong> Flight number.</li>

            <li><strong>flight_stop:</strong> Flight stop information.</li>

            <li><strong>flight_time:</strong> Flight time.</li>

            <li><strong>fromloc.airport_code:</strong> Airport code of the departure location.</li>

            <li><strong>fromloc.airport_name:</strong> Airport name of the departure location.</li>

            <li><strong>fromloc.city_name:</strong> City name of the departure location.</li>

            <li><strong>fromloc.state_code:</strong> State code of the departure location.</li>

            <li><strong>fromloc.state_name:</strong> State name of the departure location.</li>

            <li><strong>meal:</strong> Meal information.</li>

            <li><strong>meal_code:</strong> Meal code.</li>

            <li><strong>meal_description:</strong> Description of the meal.</li>

            <li><strong>mod:</strong> Modifier for any entity.</li>

            <li><strong>month_name:</strong> Name of the month.</li>

            <li><strong>or:</strong> OR condition.</li>

            <li><strong>period_of_day:</strong> Period of the day.</li>

            <li><strong>restriction_code:</strong> Restriction code.</li>

            <li><strong>return_date.date_relative:</strong> Return date relative to another date.</li>

            <li><strong>return_date.day_name:</strong> Name of the return day.</li>

            <li><strong>return_date.day_number:</strong> Number of the return day.</li>

            <li><strong>return_date.month_name:</strong> Name of the return month.</li>

            <li><strong>return_date.today_relative:</strong> Return date relative to today.</li>

            <li><strong>return_time.period_mod:</strong> Modifier for return time period.</li>

            <li><strong>return_time.period_of_day:</strong> Period of the day for return.</li>

            <li><strong>round_trip:</strong> Round trip information.</li>

            <li><strong>state_code:</strong> State code.</li>

            <li><strong>state_name:</strong> State name.</li>

            <li><strong>stoploc.airport_name:</strong> Airport name of the stop location.</li>

            <li><strong>stoploc.city_name:</strong> City name of the stop location.</li>

            <li><strong>stoploc.state_code:</strong> State code of the stop location.</li>

            <li><strong>time:</strong> Time information.</li>

            <li><strong>time_relative:</strong> Time relative to another time.</li>

            <li><strong>today_relative:</strong> Relative to today.</li>

            <li><strong>toloc.airport_code:</strong> Airport code of the destination location.</li>

            <li><strong>toloc.airport_name:</strong> Airport name of the destination location.</li>

            <li><strong>toloc.city_name:</strong> City name of the destination location.</li>

            <li><strong>toloc.country_name:</strong> Country name of the destination location.</li>

            <li><strong>toloc.state_code:</strong> State code of the destination location.</li>

            <li><strong>toloc.state_name:</strong> State name of the destination location.</li>

            <li><strong>transport_type:</strong> Type of transport.</li>

        </ul>

    """, unsafe_allow_html=True)

# How to use
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>To use this model, follow these steps in Python:</p>

</div>

""", unsafe_allow_html=True)
st.code('''

from sparknlp.base import *

from sparknlp.annotator import *

from pyspark.ml import Pipeline

from pyspark.sql.functions import col, expr, round, concat, lit



# Define the components of the pipeline

document_assembler = DocumentAssembler() \\

    .setInputCol("text") \\

    .setOutputCol("document")



tokenizer = Tokenizer() \\

    .setInputCols(["document"]) \\

    .setOutputCol("token")



embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")\\

    .setInputCols("document", "token") \\

    .setOutputCol("embeddings")



ner_model = NerDLModel.pretrained("nerdl_atis_840b_300d", "en") \\

    .setInputCols(["document", "token", "embeddings"]) \\

    .setOutputCol("ner")



ner_converter = NerConverter() \\

    .setInputCols(["document", "token", "ner"]) \\

    .setOutputCol("ner_chunk")



# Create the pipeline

pipeline = Pipeline(stages=[

    document_assembler,

    tokenizer,

    embeddings,

    ner_model,

    ner_converter

])



# Create some example data

sample_text = """

On August 20, 2024, Delta Airlines flight DL 456, operated with a B737 aircraft, will depart from Hartsfield-Jackson Atlanta International Airport (ATL) located in Atlanta, Georgia, United States. The flight will begin at 10:00 AM local time and is scheduled to conclude its departure process by 10:30 AM, reflecting the morning period of the day. This non-stop flight, classified under business class and costing $850, is set for travel on Monday, Wednesday, and Friday, with the fare basis code J. The flight is categorized as a direct route without any stops, and passengers will enjoy a vegetarian meal (meal code VGML) on board.

Upon arrival, the flight will land at Los Angeles International Airport (LAX) in Los Angeles, California, United States at 02:00 PM local time. The arrival process is expected to end by 02:30 PM, placing this in the afternoon period of the day. The flight, part of a round-trip itinerary, will return on August 27, 2024. The return flight will depart from LAX at 03:00 PM and is scheduled to conclude by 03:30 PM, also reflecting the afternoon period. Both departure and arrival times are stated in the local time zones of their respective locations.

The round-trip journey involves non-refundable tickets and features a direct flight with no connecting flights. The entire travel itinerary spans a total of 7 days from departure to return, with all dates relative to today’s date. Passengers should be aware of the restriction code attached to the fare, indicating its non-refundable nature. Additionally, the flight details include flight number, aircraft code, airline code, airline name, class type, fare amount, fare basis code, flight days, and flight stop information.

"""

data = spark.createDataFrame([[sample_text]]).toDF("text")



# Apply the pipeline to the data

model = pipeline.fit(data)

result = model.transform(data)



# Select the result, entity

result.select(

    expr("explode(ner_chunk) as ner_chunk")

).select(

    col("ner_chunk.result").alias("chunk"),

    col("ner_chunk.metadata").getItem("entity").alias("ner_label")

).show(truncate=False)

''')

# Results

st.text("""

+--------------+-------------------------+

|chunk         |ner_label                |

+--------------+-------------------------+

|20            |depart_time.time         |

|2024          |flight_number            |

|Delta Airlines|airline_name             |

|456           |flight_number            |

|B737          |aircraft_code            |

|Georgia       |airline_name             |

|10:00 AM      |depart_time.time         |

|by            |depart_time.time_relative|

|10:30 AM      |depart_time.time         |

|morning       |depart_time.period_of_day|

|non-stop      |flight_stop              |

|under         |cost_relative            |

|business class|class_type               |

|Monday        |arrive_date.day_name     |

|Wednesday     |arrive_date.day_name     |

|Friday        |arrive_date.day_name     |

|vegetarian    |meal                     |

|meal          |meal                     |

|meal          |meal                     |

|Los Angeles   |toloc.city_name          |

+--------------+-------------------------+

""")

# Benchmarks
st.markdown('<div class="sub-title">Model Benchmarks</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>The following table shows the performance benchmarks of the nerdl_atis_840b_300d model on the ATIS dataset:</p>

    <table class="benchmark-table">

        <tr>

            <th>Metric</th>

            <th>Score</th>

        </tr>

        <tr>

            <td>Precision</td>

            <td>93.5%</td>

        </tr>

        <tr>

            <td>Recall</td>

            <td>92.7%</td>

        </tr>

        <tr>

            <td>F1 Score</td>

            <td>93.1%</td>

        </tr>

    </table>

    <p>These metrics indicate the model's high accuracy in identifying and classifying airline-related entities, making it a robust tool for processing travel-related queries.</p>

</div>

""", unsafe_allow_html=True)

# Conclusion
st.markdown('<div class="sub-title">Conclusion</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>Named Entity Recognition is a powerful tool for extracting structured information from unstructured text. By leveraging the NerDLModel, you can efficiently process airline-related queries and automate responses with high accuracy.</p>

    <p>With its impressive performance metrics and the ability to identify a wide range of entities, this model is well-suited for applications in customer service, data analysis, and beyond in the travel and airline industry.</p>

    <p>For further exploration, consider integrating the model into your systems and utilizing the extracted information to enhance user experience and operational efficiency.</p>

</div>

""", unsafe_allow_html=True)

# References
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/ner/ner_dl/index.html" target="_blank" rel="noopener">NerDLModel</a> annotator documentation</li>

        <li>Model Used: <a class="link" href="https://sparknlp.org/2021/01/25/nerdl_atis_840b_300d_en.html" rel="noopener">nerdl_atis_840b_300d</a></li>

        <li><a class="link" href="https://nlp.johnsnowlabs.com/recognize_entitie" target="_blank" rel="noopener">Visualization demos for NER in Spark NLP</a></li>

        <li><a class="link" href="https://www.johnsnowlabs.com/named-entity-recognition-ner-with-bert-in-spark-nlp/">Named Entity Recognition (NER) with BERT in Spark NLP</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)

# Community & Support
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub Repository</a>: Report issues or contribute</li>

        <li><a class="link" href="https://forum.johnsnowlabs.com/" target="_blank">Community Forum</a>: Ask questions, share ideas, and get support</li>

    </ul>

</div>

""", unsafe_allow_html=True)