{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "- Invent a (new) NLP task! \n", "\t- It should not be among [these](https://nlpprogress.com/) tasks (under English) but you can use them for inspiration. \n", "\t- It does not need to be useful or even make perfect sense but it should be learnable for a human; i.e. you should be able to explain it for a data annotator. \n", "\t- Pick one of these categories: \n", "\t\t- text classification\n", "\t\t- token classification\n", "\t\t- sequence-to-sequence generation\n", "- Generate a toy dataset for your task (~ 20 samples). You can use ChatGPT for help, if you want!\n", "- Make your datset readable for the `load_dataset` module, so that you can load your data just by using `load_dataset(, probably_some_arguments)`. Use [these](https://huggingface.co/docs/datasets/loading) guidelines.\n", "- Use the `.map()` module to perform a simple process on your dataset. Ideally this should be relevant to your task, but if you can't think of such a process just do a random thing like lower-casing or lemmatizing. Use [these](https://huggingface.co/docs/datasets/process) guidelines.\n", "- Upload your dataset on HuggingFace Hub, using [these](https://huggingface.co/docs/datasets/upload_dataset) guidelines. Now you (or anyone else, if it is public) can access and load your data easily, just by `load_dataset()` 😊 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The idea of this model is to create a specific gender classification, focusing on teenagers at this time. The difference in vocabulary is first of all immense compared to other generations, but the difference between boys and girls at this age also appears to be very significant. The purpose here is to create a dataset of typical texts for both groups, inspired on text messages and social media messages and attempt to distinguish these using stereo(typical) words regarding both." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "boy_words = ['dude', 'bro', 'chill', 'gaming', 'sports', 'cars', 'video games', 'cool', 'calm', 'football', 'basketball', 'vibe', 'skate', 'skating', 'man', 'fine', 'yo']" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "girl_words = ['omg', 'bff', 'shopping', 'makeup', 'social media', 'instagram', 'tiktok', 'boyfriend', 'romantic', 'cute', 'fashion', 'dance', 'dress', 'skirt', 'prom', 'coffee bar', 'girl', 'hey', 'period']" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "boy_dataset = [\"Hey man, what's up? It feels like forever since we last hung out. I'm stoked for the concert next week, it's gonna be epic. I've been practicing the guitar like crazy so I can shred some sick solos during the show. Did you hear about that new video game that just came out? It's supposed to be insane. I think we should get it and have a gaming marathon this weekend. We could stock up on snacks and soda and just play all day. By the way, have you talked to Sarah recently? I saw her at the gym the other day and she was looking fine as hell. I'm thinking of asking her out on a date, but I don't want to mess things up with her. You know how it is, man. Anyway, let me know what you think about the game idea. I can't wait to hang out with you again and catch up on everything.\", \"Sup bro, you good? I just got back from the skate park, and I landed my first kickflip! It was sick. I'm starting to get the hang of this skating thing, and I'm hoping to land an ollie next. I asked my mom if we could get together friday night and play Fortnite and maybe watch some Marvel movies too. She said that's fine as long as your mom is fine with it too. Lmk man!\", \"Yo bro, what's up? I just got done practicing with the band, and we're getting pretty good. We're playing at the school talent show next month, and I'm stoked. I'm gonna shred some sick guitar riffs and blow everyone away. Have you heard the new album from that indie band we like? It's fire. We should download it and listen to it together next time we hang out. Speaking of which, are you free this weekend? We could go hit up the mall and maybe grab a bite. Oh, and did you see the game last night? Our team got smoked by our rivals, and it was brutal. But we're gonna get 'em next time. I'm gonna hit the gym and practice my basketball skills all week so we can come back stronger. Anyway, let me know if you're down for some mall action this weekend. And if you've got any new bands you're into, send 'em my way.\", \"Hey man, what's up? I'm so glad the weekend is finally here. School has been crazy this week. I had a big history test on Tuesday that I was freaking out about, but I think I did pretty well. I also had to turn in a big English paper yesterday that I was stressing over for weeks. Have you started working on that group project for science class yet? We should get together and work on it this weekend. And speaking of science, did you hear that our school is gonna get a new chemistry lab next year? That's gonna be sick. I'm also trying to decide what classes to take next year. I definitely want to take art, but I'm not sure if I should take Spanish or French for my language requirement. What do you think?\", \"Hey man, sorry I haven't been in touch lately, things at home have been pretty rough. My parents have been fighting a lot lately, and it's really been stressing me out. I hate it when they fight, and I feel like it's all my fault sometimes. I've been trying to stay out of the house as much as possible to avoid the drama. I've been hanging out at the skate park a lot and trying to focus on my tricks. It's been helping me clear my head a bit. But school has been tough too. It's hard to concentrate on my studies when I have all this other stuff going on. I'm worried my grades are gonna suffer. And to make matters worse, my little brother has been acting out lately because of all the tension at home. It's been hard to deal with. Anyway, I just wanted to vent a bit. I know you're always a good listener. Let's try to hang out soon. That always helps me forget my problems for a little while.\", \"Yo dude! So, I met this girl at my job the other day, and we hit it off. Her name is Emily, and she's super cool. We've been texting back and forth nonstop ever since we met. I think I might be starting to like her, you know? I'm planning on going to the park together next weekend since the sun should be out. I'm trying to figure out if I should make a move and try to hold her hand or something tho. Do you think that would be too forward? I'm also trying to come up with a good gift to give her. I was thinking maybe I could get her a necklace or something, but I don't want to come on too strong. What do you think?\", \"Hey man, I need your help with something. I'm trying to build a gaming PC and I have no idea where to start. I know you're into that stuff, do you think you could give me some advice on what parts to buy?\", \"Bro, have you listened to the new Tyler, The Creator album yet? It's so fire. I love his style, he's always pushing boundaries and experimenting with new sounds. We should bump it in the car later and vibe out.\", \"Dude, I think I have a crush on Emma from math class. She's so cute and funny, and I love talking to her. I don't know if she feels the same way though. Should I try to ask her out, or just keep things friendly for now?\"]" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "girl_dataset = [\"Hey girl, what's up? I'm feeling so stressed out right now. My parents are pressuring me to apply to all these colleges that I don't even want to go to. I know they just want what's best for me, but I feel like they're not really listening to what I want. I just want to go to a school where I can study art and design, but they keep pushing me towards more traditional majors. Ugh, it's so frustrating.\", \"Hey, did you see the latest episode of Riverdale? I can't believe what happened to Jughead! I was literally on the edge of my seat the whole time. I can't wait to see what happens next week. Are you caught up yet?\", \"So, I'm thinking about starting a blog. I love writing and I think it would be a great way to express myself and connect with other people who have similar interests. I'm just not sure what to write about yet. Maybe fashion or beauty, or maybe even something more personal like mental health. What do you think?\", \"Ugh, I just got my period and I feel like crap. I have cramps and I'm so bloated. I just want to curl up in bed and watch Netflix all day. Do you have any tips for feeling better?\", \"Hey girl, I need your advice. There's this guy in my math class who keeps flirting with me, but I'm not sure if I'm interested or not. He's cute and all, but I don't know if we have anything in common. Plus, I'm not really looking for a relationship right now. What should I do?\", \"So, my parents are making me go to my cousin's wedding this weekend, and I'm dreading it. I hate wearing dresses and I don't know anyone there. Plus, I have a huge test on Monday that I need to study for. I just want to stay home and watch Netflix\", \"I'm feeling really anxious lately. I don't know why, but I just can't seem to calm down. I keep worrying about everything, even things that don't matter. Do you ever feel like that? How do you cope?\", \"I just got back from the mall, and I went a little overboard with my shopping. I got some new clothes and makeup that I probably didn't need, but they were so cute. Now I'm broke though, lol. I need to start saving up for college.\", \"I'm so excited for summer break! I can't wait to hang out at the beach and go to concerts with my friends. We're planning a road trip to LA, and I'm hoping to meet some cute boys along the way. It's gonna be epic!\", \"Hey girl, I need your help. I'm having a crisis over what to wear to the school dance. I want to look cute, but I don't want to overdo it. Should I wear a dress or a skirt? Heels or flats? I'm so indecisive!\", \"OMG, did you hear what happened at the party last weekend? I can't believe Sarah hooked up with Jake! I thought she was into Mike. And then there was that drama with Emma and Abby. High school is so dramatic, lol.\", \"Hey, have you tried the new coffee shop that just opened up downtown? I went there with my mom last weekend and it was so cute. They had the best lattes and the vibe was so cozy. We should go there together sometime.\", \"Hey girl, I'm feeling so frustrated with my parents right now. They keep lecturing me about my grades, but I'm doing the best I can. I have so much on my plate with school and extracurriculars, and I feel like they don't understand how much pressure I'm under. Do you ever feel like your parents just don't get it?\", \"Hey girl, I need your opinion on something. I'm thinking about cutting my hair really short, like a pixie cut. Do you think it would look good on me? I'm so nervous to make such a big change to my appearance, but I feel like I need to mix things up.\", \"I'm so excited for prom next month! I already found my dress and I'm getting my hair and makeup done professionally. I hope my crush asks me to be his date, but even if he doesn't, I'm still going to have a blast with my friends. Do you have any ideas for after-party activities?\"]" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
0Hey man, what's up? It feels like forever sinc...
1Sup bro, you good? I just got back from the sk...
2Yo bro, what's up? I just got done practicing ...
3Hey man, what's up? I'm so glad the weekend is...
4Hey man, sorry I haven't been in touch lately,...
\n", "
" ], "text/plain": [ " 0\n", "0 Hey man, what's up? It feels like forever sinc...\n", "1 Sup bro, you good? I just got back from the sk...\n", "2 Yo bro, what's up? I just got done practicing ...\n", "3 Hey man, what's up? I'm so glad the weekend is...\n", "4 Hey man, sorry I haven't been in touch lately,..." ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_boys = pd.DataFrame(boy_dataset)\n", "df_boys.head()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
0Hey girl, what's up? I'm feeling so stressed o...
1Hey, did you see the latest episode of Riverda...
2So, I'm thinking about starting a blog. I love...
3Ugh, I just got my period and I feel like crap...
4Hey girl, I need your advice. There's this guy...
\n", "
" ], "text/plain": [ " 0\n", "0 Hey girl, what's up? I'm feeling so stressed o...\n", "1 Hey, did you see the latest episode of Riverda...\n", "2 So, I'm thinking about starting a blog. I love...\n", "3 Ugh, I just got my period and I feel like crap...\n", "4 Hey girl, I need your advice. There's this guy..." ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_girls = pd.DataFrame(girl_dataset)\n", "df_girls.head()" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "combined = boy_dataset + girl_dataset\n", "random.shuffle(combined)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
train
17Yo bro, what's up? I just got done practicing ...
9Hey man, sorry I haven't been in touch lately,...
3I'm so excited for prom next month! I already ...
22Hey girl, I need your help. I'm having a crisi...
5Hey girl, I'm feeling so frustrated with my pa...
\n", "
" ], "text/plain": [ " train\n", "17 Yo bro, what's up? I just got done practicing ...\n", "9 Hey man, sorry I haven't been in touch lately,...\n", "3 I'm so excited for prom next month! I already ...\n", "22 Hey girl, I need your help. I'm having a crisi...\n", "5 Hey girl, I'm feeling so frustrated with my pa..." ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({'train': combined}, index=range(len(combined)))\n", "df.sample(5)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "df.to_csv('teenager_data.csv', index=False)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading and preparing dataset csv/default to C:/Users/mauqu/.cache/huggingface/datasets/csv/default-bda64afcfa1ea642/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "230d953b89bc4188855685f92779d4f8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading data files: 0%| | 0/1 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
text_data
00: ugh, i just got my period and i feel like c...
11: omg, did you hear what happened at the part...
22: hey man, what's up? it feels like forever s...
33: hey girl, what's up? i'm feeling so stresse...
44: hey man, sorry i haven't been in touch late...
\n", "" ], "text/plain": [ " text_data\n", "0 0: ugh, i just got my period and i feel like c...\n", "1 1: omg, did you hear what happened at the part...\n", "2 2: hey man, what's up? it feels like forever s...\n", "3 3: hey girl, what's up? i'm feeling so stresse...\n", "4 4: hey man, sorry i haven't been in touch late..." ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "check = pd.read_csv('teenager_dataset_final.csv')\n", "check.head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 2 }