Spark and Python for Big Data with PySpark

share ›
‹ links

Below are the top discussions from Reddit that mention this online Udemy course.

Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2

Reddemy may receive an affiliate commission if you enroll in a paid course after using these buttons to visit Udemy. Thank you for using these buttons to support Reddemy.

Taught by
Jose Portilla

Reddit Posts and Comments

0 posts • 6 mentions • top 6 shown below

r/dataengineering • post
9 points • PhotographsWithFilm
Which Udemy PySpark course should I choose?

Hi Folks,

Enough with fooling around (& getting stumped) with PySpark. Its time to do some tutorials.

I notice that there are two reasonably well regarded courses on Udemy:

Taming Big Data With Apache Spark And Python Hands On

Spark and Python for Big Data with PySpark

I like that the second one has a bit more of a AWS slant, but it would mean having to find someone to run up Ubuntu.

Has anyone had any experience with either of the above?

Cheers

r/datascience • comment
1 points • Vast_Balls

Thank you for the response!

>PySpark has a bit of a learning curve but you'll be fine. Databricks actuallly made ported the pandas to work on their runtime a month or two ago. You could get away using the pandas API on spark but I think it's a good idea to properly learn spark.

I have taken this course on PySpark: https://www.udemy.com/course/spark-and-python-for-big-data-with-pyspark/ and it was fairly easy, do you have any other good resources?

The quirk I remember the most was that all of your independent variables should be stored in a single vector. I definitely need to learn about MLFlow, I am only a little familiar.

>On databricks you have different sizes of clusters with the smallest one being a single compute node. You can do your PoC on that, it's also very easy to swap out to bigger clusters if you need that.

This is a big question I had, I have no clue how many clusters I need and what size. I imagine I can just always go bigger as I need it.

r/apachespark • comment
1 points • no_condoments

I think you're doing something wrong on Udemy. Many of the courses have "list prices" of $200, but they are always on sale for $10 to $15 dollars. I'm not using hyperbole when I say "always", I literally mean that the courses are on sale 100% of the time. The course below is currently $11

https://www.udemy.com/course/spark-and-python-for-big-data-with-pyspark/

r/dataengineering • comment
2 points • maosama007

I just checked the course and he has updated it for spark 3.

I have taken few courses in udemy. These are my suggestions:

If you just want an general overview of spark, hadoop, spark streaming, MLlib, etc then you can take the course you mentioned.

If you are going to be working more on spark dataframes, all kinds of transformations like join, group by, regex, etc you can go with the following course.

https://www.udemy.com/course/spark-essentials/

The course is in scala, but it is very similar to python.

If you need streaming specially you can go with spark streaming course by frank kane

https://www.udemy.com/course/taming-big-data-with-spark-streaming-hands-on/

If you need Spark for ML then you can go with Jose portilla's course.

https://www.udemy.com/course/spark-and-python-for-big-data-with-pyspark/

r/dataengineering • comment
1 points • Omar_88
r/dataengineering • comment
1 points • SatoriSlu

Hi,

I'd be happy to. I'm new to the world of Big Data so this is my current plan of action over the next 10 days.

I am honestly surprised that this role (AN SRE ROLE not Data Engineer) is asking me to review these concepts...but here I am!

1) Today: Go through the Linux Academy course: Big Data Essentials, just to get a high level overview of the field and what the different tools do.

- https://linuxacademy.com/course/big-data-essentials/

2) Fri - Sat: Brush up on my SQL using
- https://app.dataquest.io/course/sql-fundamentals

- https://app.dataquest.io/course/sql-joins-relations-de

​

3) Sun- Tues--> Tackle Apache Spark using these resources:

- https://www.dataquest.io/course/spark-map-reduce/

- https://www.udacity.com/course/learn-spark-at-udacity--ud2002

- I MAY purchase this course (https://www.udemy.com/course/spark-and-python-for-big-data-with-pyspark/)

​

3) Wens - Thurs: Hive, I haven't found too much here, just this I would appreciate any help from the community on finding resources/labs to learn this.

- https://www.guru99.com/hive-tutorials.html

4) Fri - Sun: Kafka

- https://linuxacademy.com/cp/modules/view/id/360?redirect_uri=https://app.linuxacademy.com/search?query=Kafka

Please let me know what you guys think!