Spaces:

datenwerkzeuge
/

CDL-Webscraping-Workshop-2025

Running

App Files Files Community

bsenst commited on Dec 26, 2024

Commit

d874f49

1 Parent(s): 79b5694

complete structure clean up

Browse files

Files changed (15) hide show

src/03_low_code/app_market_scraping.qmd +12 -0
src/03_low_code/catalogue.qmd +11 -0
src/03_low_code/video_transcripts.qmd +11 -0
src/03_low_code/video_transcripts/_0b1fd4bd-7f49-4655-bb32-462a52df7eba.jpeg +0 -0
src/03_low_code/video_transcripts/_1001328a-6814-4c32-9ce2-782aeef96791.jpeg +0 -0
src/03_low_code/video_transcripts/get_videos_for_youtube_channels.ipynb +0 -0
src/03_low_code/video_transcripts/youtube-transcript-extraction.ipynb +14 -0
src/04_use_case/forum/buergergeld_forum.ipynb +0 -0
src/04_use_case/jobs/Jobboerse_API.ipynb +0 -0
src/04_use_case/jobs/_f6a36d83-c0f2-4029-a621-0ccfc358b18a.jpeg +0 -0
src/04_use_case/laws/Gesetze_im_Internet_Aktualitätendienst.ipynb +14 -0
src/04_use_case/laws/_d38cd4e9-1da8-4d7e-9bae-f01370cd2049.jpeg +0 -0
src/_quarto.yml +64 -66
src/low_code.qmd +7 -15
src/use_case.qmd +22 -15

src/03_low_code/app_market_scraping.qmd ADDED Viewed

	@@ -0,0 +1,12 @@

+---
+title: "App Marktplatz analysieren"
+description: "Informationen zu zahlreichen Apps abrufen und auswerten."
+listing:
+  - id: app_market_scraping
+    contents: "app_market_scraping"
+    type: grid
+---
+::: {#app_market_scraping}
+:::

src/03_low_code/catalogue.qmd ADDED Viewed

	@@ -0,0 +1,11 @@

+---
+title: "Kataloge erfassen"
+description: "Gezielt Informationen aus Datenstrukturen extrahieren."
+listing:
+  - id: catalogue
+    contents: "catalogue"
+    type: grid
+---
+::: {#catalogue}
+:::

src/03_low_code/video_transcripts.qmd ADDED Viewed

	@@ -0,0 +1,11 @@

+---
+title: "Videotranskripte"
+description: "Transkripte langer Videoinhalte erhalten und aufarbeiten."
+listing:
+  - id: video_transcripts
+    contents: "video_transcripts"
+    type: grid
+---
+::: {#video_transcripts}
+:::

src/03_low_code/video_transcripts/_0b1fd4bd-7f49-4655-bb32-462a52df7eba.jpeg ADDED Viewed

src/03_low_code/video_transcripts/_1001328a-6814-4c32-9ce2-782aeef96791.jpeg ADDED Viewed

src/03_low_code/video_transcripts/get_videos_for_youtube_channels.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

src/03_low_code/video_transcripts/youtube-transcript-extraction.ipynb CHANGED Viewed

@@ -1,5 +1,19 @@
 {
  "cells": [
   {
    "cell_type": "code",
    "execution_count": 1,

 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "title: \"Videos für YouTube-Kanäle abrufen\"\n",
+    "description: \"Ein Tool zur Suche und Auflistung von Videos eines YouTube-Kanals basierend auf dem Kanalnamen, einschließlich der Anzeige von Videodetails und direkten Links.\"\n",
+    "author: \"Benjamin\"\n",
+    "date: \"2024-12-16\"\n",
+    "date-modified: \"2024-12-16\"\n",
+    "image: _0b1fd4bd-7f49-4655-bb32-462a52df7eba.jpeg\n",
+    "---"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,

src/04_use_case/forum/buergergeld_forum.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

src/04_use_case/jobs/Jobboerse_API.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

src/04_use_case/jobs/_f6a36d83-c0f2-4029-a621-0ccfc358b18a.jpeg ADDED Viewed

src/04_use_case/laws/Gesetze_im_Internet_Aktualitätendienst.ipynb CHANGED Viewed

@@ -1,5 +1,19 @@
 {
   "cells": [
     {
       "cell_type": "code",
       "execution_count": 36,

 {
   "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "---\n",
+        "title: \"RSS-Feed-Analyse: Gesetze im Internet Aktualitätendienst\"\n",
+        "description: \"Ein Tool zur Extraktion und Analyse von RSS-Feeds des Aktualitätendienstes für Gesetze im Internet, einschließlich der Verarbeitung und Visualisierung der Daten.\"\n",
+        "author: \"Benjamin\"\n",
+        "date: \"2024-12-16\"\n",
+        "date-modified: \"2024-12-16\"\n",
+        "image: _d38cd4e9-1da8-4d7e-9bae-f01370cd2049.jpeg\n",
+        "---"
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": 36,

src/04_use_case/laws/_d38cd4e9-1da8-4d7e-9bae-f01370cd2049.jpeg ADDED Viewed

src/_quarto.yml CHANGED Viewed

@@ -21,86 +21,84 @@ website:
     - title: "Start"
       contents:
         - href: index.qmd
-          text: "Willkommen"
-        - href: 01_setup/agenda.qmd
-          text: "Agenda 📅"
-        - section: "Vorbereitung"
-          href: 01_setup/vorbereitung.qmd
           contents:
-          - href: 01_setup/erforderlich/google-konto.qmd
-            text: "Google Konto erstellen"
-          - href: 01_setup/erforderlich/colab.qmd
-            text: "Colab nutzen"
-          - href: 01_setup/erforderlich/huggingface.qmd
-            text: "Huggingface Ressourcen"
-        - section: "Optional"
-          href: 01_setup/vorbereitung.html#optional
-          contents:
-          - href: 01_setup/optional/colab-github.qmd
-            text: "Colab nach GitHub speichern"
-          - href: 01_setup/optional/quarto-lokal.qmd
-            text: "Quarto lokal"
     - title: "No Code"
       contents:
         - href: basics.qmd
-          text: "No Code Übersicht"
-        - section: "PDF"
-          href: 02_basics/pdf.qmd
-          contents:
-          - href: 02_basics/pdf/pdf-link-extractor.qmd
-            text: "PDF Link Extractor"
-          - href: 02_basics/pdf/pdf-grouping.qmd
-            text: "PDF Grouping"
-        - section: "App Marketplace"
-          href: 02_basics/app_market.qmd
           contents:
-          - href: 02_basics/app_market/google-play-search.qmd
-            text: "Google Play Search"
-        - section: "Webspider"
-          href: 02_basics/webspider.qmd
-          contents:
-          - href: 02_basics/webspider/website-url-extractor.qmd
-            text: "URL Extractor"
-          - href: 02_basics/webspider/webspider.qmd
-            text: "Webspider"
     - title: "Low Code"
       contents:
-        - href: low_code.qmd
-          text: "Low Code Übersicht"
-        - section: "Katalog"
-          contents:
-            - href: 03_low_code/catalogue/bookstoscrape.qmd
-              text: "Bücherliste scrapen"
-            - href: 03_low_code/catalogue/quotes_scraper.ipynb
-              text: "Zitate scrapen"
-        - section: "App Markt"
           contents:
-            - href: 03_low_code/app_market_scraping/app_market_scraping.qmd
-              text: "App Markt scrapen"
-        - section: "Video Transkripte"
-          contents:
-            - href: 03_low_code/video_transcripts/social-media.qmd
-              text: "Hinweise Scraping Social Media"
-            - href: 03_low_code/video_transcripts/get_videos_for_youtube_channels.ipynb
-              text: "YouTube Channel Videos"
-            - href: 03_low_code/video_transcripts/youtube-transcript-extraction.ipynb
-              text: "YouTube Video Transcripts"
     - title: "Use-Case"
       contents:
-        - href: use_case.qmd
-          text: "Anwendungsfall Übersicht"
-        - section: "Gesetze"
-          contents:
           - href: 04_use_case/laws/Gesetze_im_Internet_Aktualitätendienst.ipynb
             text: "Aktualitätendienst Gesetze"
-        - section: "Jobs"
-          contents:
           - href: 04_use_case/jobs/Jobboerse_API.ipynb
             text: "Jobbörse"
-        - section: "Forum"
-          contents:
-            - href: 04_use_case/forum/buergergeld_forum.ipynb
-              text: "Buergergeld Forum"
     - title: "Blog"
       contents:
         - href: blog.qmd

     - title: "Start"
       contents:
         - href: index.qmd
+          section: "Willkommen"
           contents:
+          - href: 01_setup/agenda.qmd
+            text: "Agenda 📅"
+          - section: "Vorbereitung"
+            href: 01_setup/vorbereitung.qmd
+            contents:
+            - href: 01_setup/erforderlich/google-konto.qmd
+              text: "Google Konto erstellen"
+            - href: 01_setup/erforderlich/colab.qmd
+              text: "Colab nutzen"
+            - href: 01_setup/erforderlich/huggingface.qmd
+              text: "Huggingface Ressourcen"
+          - section: "Optional"
+            href: 01_setup/vorbereitung.html#optional
+            contents:
+            - href: 01_setup/optional/colab-github.qmd
+              text: "Colab nach GitHub speichern"
+            - href: 01_setup/optional/quarto-lokal.qmd
+              text: "Quarto lokal"
     - title: "No Code"
       contents:
         - href: basics.qmd
+          section: "No Code Übersicht"
           contents:
+          - section: "PDF"
+            href: 02_basics/pdf.qmd
+            contents:
+            - href: 02_basics/pdf/pdf-link-extractor.qmd
+              text: "PDF Link Extractor"
+            - href: 02_basics/pdf/pdf-grouping.qmd
+              text: "PDF Grouping"
+          - section: "App Marketplace"
+            href: 02_basics/app_market.qmd
+            contents:
+            - href: 02_basics/app_market/google-play-search.qmd
+              text: "Google Play Search"
+          - section: "Webspider"
+            href: 02_basics/webspider.qmd
+            contents:
+            - href: 02_basics/webspider/website-url-extractor.qmd
+              text: "URL Extractor"
+            - href: 02_basics/webspider/webspider.qmd
+              text: "Webspider"
     - title: "Low Code"
       contents:
+        - section: "Low Code Übersicht"
+          href: low_code.qmd
           contents:
+          - section: "Kataloge erfassen"
+            href: 03_low_code/catalogue.qmd
+            contents:
+              - href: 03_low_code/catalogue/bookstoscrape.qmd
+                text: "Bücherliste scrapen"
+              - href: 03_low_code/catalogue/quotes_scraper.ipynb
+                text: "Zitate scrapen"
+          - href: 03_low_code/app_market_scraping/app_market_scraping.qmd
+            text: "App Markt analysieren"
+          - section: "Video Transkripte"
+            href: 03_low_code/video_transcripts.qmd
+            contents:
+              - href: 03_low_code/video_transcripts/social-media.qmd
+                text: "Hinweise Scraping Social Media"
+              - href: 03_low_code/video_transcripts/get_videos_for_youtube_channels.ipynb
+                text: "YouTube Channel Videos"
+              - href: 03_low_code/video_transcripts/youtube-transcript-extraction.ipynb
+                text: "YouTube Video Transcripts"
     - title: "Use-Case"
       contents:
+        - section: "Anwendungsfall Übersicht"
+          href: use_case.qmd
+          contents:
           - href: 04_use_case/laws/Gesetze_im_Internet_Aktualitätendienst.ipynb
             text: "Aktualitätendienst Gesetze"
           - href: 04_use_case/jobs/Jobboerse_API.ipynb
             text: "Jobbörse"
+          - href: 04_use_case/forum/buergergeld_forum.ipynb
+            text: "Buergergeld Forum"
     - title: "Blog"
       contents:
         - href: blog.qmd

src/low_code.qmd CHANGED Viewed

@@ -1,22 +1,14 @@
 ---
-title: "Low Code Übersicht"
 listing:
-  - id: catalogue
-    contents: "03_low_code/catalogue"
-    type: grid
-  - id: app_market_scraping
-    contents: "03_low_code/app_market_scraping"
-    type: grid
-  - id: video_transcripts
-    contents: "03_low_code/video_transcripts"
     type: grid
 ---
-::: {#catalogue}
-:::
-::: {#app_market_scraping}
-:::
-::: {#video_transcripts}
-:::

 ---
 listing:
+  - id: low_code
+    contents: ["03_low_code/catalogue.qmd","03_low_code/app_market_scraping.qmd","03_low_code/video_transcripts.qmd"]
     type: grid
 ---
+## Lernziele
+**Extraktion von Buchdaten von der Website "Books to Scrape" mit Python und BeautifulSoup**: Praktische Übung im Web-Scraping, um das gezielte extrahieren aus Datenstrukturen zu verstehen.
+**Scraping von App-Marktdaten**: Erstellen einer Übersicht über den App-Marktplatz, um Apps zu identifizieren, die für die Arbeit von Non-Profit-Organisationen und zivilgesellschaftlichen Akteuren nützlich sein können.
+**Extraktion von YouTube-Transkripten und deren Speicherung als PDF-Dateien**: Lernen, wie man Transkripte von Bildungs- und Informationsvideos extrahiert, um diese Inhalte leichter zugänglich und weiterverwendbar für Bildungsarbeit, Advocacy und Sensibilisierungsmaßnahmen zu machen.

src/use_case.qmd CHANGED Viewed

@@ -1,22 +1,29 @@
 ---
-title: "Anwendungsfall Übersicht"
 listing:
-  - id: laws
-    contents: "04_use_case/laws"
-    type: grid
-  - id: jobs
-    contents: "04_use_case/jobs"
-    type: grid
-  - id: forum
-    contents: "04_use_case/forum"
     type: grid
 ---
-::: {#laws}
-:::
-::: {#jobs}
-:::
-::: {#forum}
-:::

 ---
 listing:
+  - id: use_case
+    contents: "04_use_case"
     type: grid
 ---
+## Lernziele
+**Web Scraping von Foren**
+* Herunterladen und Speichern von HTML-Seiten eines Forums.
+* Extraktion und Analyse von Forenbeiträgen und Metadaten.
+* Verarbeitung und Bereinigung der extrahierten Daten mit Pandas.
+**Nutzung der Jobbörse API**
+* Abrufen von Jobangeboten über die Jobbörse API.
+* Verarbeitung und Analyse der abgerufenen Daten mit Pandas.
+* Visualisierung der Daten und Erstellung von * Häufigkeitsverteilungen.
+**RSS-Feed-Analyse**
+* Abrufen und Parsen von RSS-Feeds mit feedparser.
+* Umwandlung der Feed-Daten in ein Pandas DataFrame.
+* Analyse und Visualisierung der Feed-Daten.
+::: {#use_case}
+:::