Mattis: /* Wikipedia */ wikipediaapi hinzugefügt

2021-01-20T14:11:00Z

Wikipedia: wikipediaapi hinzugefügt

← Nächstältere Version		Version vom 20. Januar 2021, 16:11 Uhr
Zeile 18:		Zeile 18:
	===Wikiextractor===		===Wikiextractor===
	https://github.com/attardi/wikiextractor		https://github.com/attardi/wikiextractor
			===WikipediaAPI===
			https://pypi.org/project/Wikipedia-API/

	==Tweets scrapen==		==Tweets scrapen==

C.heck: Die Seite wurde neu angelegt: „=How to get the trainigdata?= `exMedia_Machines/Seminar_Einführung-in-die-Programmierung-KI/04_07-11_maschinelles-lesen/'''02_load_scrape-data.ipy…“`


2021-01-18T21:08:16Z
Die Seite wurde neu angelegt: „=How to get the trainigdata?= <small><code>exMedia_Machines/Seminar_Einführung-in-die-Programmierung-KI/04_07-11_maschinelles-lesen/'''02_load_scrape-data.ipy…“
Neue Seite
=How to get the trainigdata?=

<small><code>exMedia_Machines/Seminar_Einführung-in-die-Programmierung-KI/04_07-11_maschinelles-lesen/'''02_load_scrape-data.ipynb'''</code></small>



see more...: https://www.nltk.org/book/ch03.html



==File aus eigener Datenbank einlesen==

 filename = 'Dateipfad'

 file = open(filename, 'rt')

 amw1 = file.read()

 file.close()



==vorbearbeitete Trainingsdatenbanken==

links hierein



==Wikipedia==

===Wiki2Text===

Extrahieren eines Plain-Text-Korpus aus MediaWiki-XML-Dumps wie Wikipedia, siehe: https://github.com/rspeer/wiki2text

===Wikiextractor===

https://github.com/attardi/wikiextractor



==Tweets scrapen==

* https://medium.com/@limavallantin/mining-twitter-for-sentiment-analysis-using-python-a74679b85546

* https://medium.com/better-programming/how-to-build-a-twitter-sentiments-analyzer-in-python-using-textblob-948e1e8aae14

* https://www.researchgate.net/post/How_to_download_the_hashtag_data_set_from_twitter_and_instagram



Beispielcode von https://gist.github.com/sxshateri/540aead254bfa7810ee8bbb2d298363e:

 import tweepy

 import csv

 import pandas as pd

 import sys

 

 # API credentials here

 consumer_key = 'INSERT CONSUMER KEY HERE'

 consumer_secret = 'INSERT CONSUMER SECRET HERE'

 access_token = 'INSERT ACCESS TOKEN HERE'

 access_token_secret = 'INSERT ACCESS TOKEN SECRET HERE'

 

 auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

 auth.set_access_token(access_token, access_token_secret)

 api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

 

 # Search word/hashtag value 

 HashValue = ""

 

 # search start date value. the search will start from this date to the current date.

 StartDate = ""

 

 # getting the search word/hashtag and date range from user

 HashValue = input("Enter the hashtag you want the tweets to be downloaded for: ")

 StartDate = input("Enter the start date in this format yyyy-mm-dd: ")

 

 # Open/Create a file to append data

 csvFile = open(HashValue+'.csv', 'a')

 

 #Use csv Writer

 csvWriter = csv.writer(csvFile) 

 

 for tweet in tweepy.Cursor(api.search,q=HashValue,count=20,lang="en",since=StartDate, tweet_mode='extended').items():

     print (tweet.created_at, tweet.full_text)

     csvWriter.writerow([tweet.created_at, tweet.full_text.encode('utf-8')])

 

 print ("Scraping finished and saved to "+HashValue+".csv")

 #sys.exit()



==Webseiten downloaden==

im Html-Format:

 url = "https://theorieblog.attac.de/quo-vadis-homo-spiens/"

 html = request.urlopen(url).read().decode('utf8')

 print(html[:60])



schon im Textformat (z.B. von Gutenberg):

 from urllib import request

 url = "http://www.gutenberg.org/files/2554/2554-0.txt"

 response = request.urlopen(url)

 raw = response.read().decode('utf8')

 print(raw[1000:1275])



----

----

How to get your trainigdata - Versionsgeschichte

Mattis: /* Wikipedia */ wikipediaapi hinzugefügt

C.heck: Die Seite wurde neu angelegt: „=How to get the trainigdata?= exMedia_Machines/Seminar_Einführung-in-die-Programmierung-KI/04_07-11_maschinelles-lesen/'''02_load_scrape-data.ipy…“

C.heck: Die Seite wurde neu angelegt: „=How to get the trainigdata?= `exMedia_Machines/Seminar_Einführung-in-die-Programmierung-KI/04_07-11_maschinelles-lesen/'''02_load_scrape-data.ipy…“`