Crio Projects - Web scraping Facebook bot | Crio.Do | Project-Based Learning Platform for Developers

Objective

You will be creating an application which will perform web scraping of hot posts from a subreddit and automatically publish them in a FB group/page periodically.


Project Context

Web scraping, also termed as web data extraction, is the process of collecting structured data in an automated way. Generally web scraping is used by businesses for making use of the vast amount of publicly available information, so that they are able to make smarter decisions. In our project though, we are going to have some fun with it by web scraping popular posts from a subreddit. If you don’t know what a subreddit is - subreddits are like groups on Reddit, the internet’s most popular website!


Facebook is an all time favourite social media platform and most of us are a part of it. In this project, we will be automating the process of sharing a popular post from a subreddit in a dedicated FB group or page.


Disclaimer: Web Scraping has to be used only for learning purposes. Any other attempts to use the data scraped might result in legal action or Your IP might get blocked.


Project Stages

The project consists of the following stages:

image alt text

High-Level Approach

  • Web scrape content which you want to post on FB, for example memes, from a subreddit.

  • Perform Selenium Web Automation for automatically sharing hot posts from a subreddit in a dedicated Facebook group or page.

  • Come up with a script that performs the aforementioned tasks periodically.

  • Deploy your application on a cloud platform.

Pre-requisite skills

  • Python

Post Project Skills

  • Selenium

Objective

You will be creating an application which will perform web scraping of hot posts from a subreddit and automatically publish them in a FB group/page periodically.


Project Context

Web scraping, also termed as web data extraction, is the process of collecting structured data in an automated way. Generally web scraping is used by businesses for making use of the vast amount of publicly available information, so that they are able to make smarter decisions. In our project though, we are going to have some fun with it by web scraping popular posts from a subreddit. If you don’t know what a subreddit is - subreddits are like groups on Reddit, the internet’s most popular website!


Facebook is an all time favourite social media platform and most of us are a part of it. In this project, we will be automating the process of sharing a popular post from a subreddit in a dedicated FB group or page.


Disclaimer: Web Scraping has to be used only for learning purposes. Any other attempts to use the data scraped might result in legal action or Your IP might get blocked.


Project Stages

The project consists of the following stages:

image alt text

High-Level Approach

  • Web scrape content which you want to post on FB, for example memes, from a subreddit.

  • Perform Selenium Web Automation for automatically sharing hot posts from a subreddit in a dedicated Facebook group or page.

  • Come up with a script that performs the aforementioned tasks periodically.

  • Deploy your application on a cloud platform.

Pre-requisite skills

  • Python

Post Project Skills

  • Selenium

Web scraping popular posts from a subreddit

Automating the process of publishing posts in a FB group or page

Every time when we see a nice post on Reddit, we want to share it with the world. We generally download the file or take a screenshot of the post and share the image. In the earlier milestone, we fetched popular posts from a subreddit which we want to post on FB. In this milestone, we’ll be able to share the popular posts, from a subreddit, on FB, by running a script.


We’ll be making use of Selenium Web Driver, which works on the browser directly and uses the browser’s in-built features to trigger the automation test written by the tester. You’ll be writing a script that fetches and interacts with the web elements.

For example, suppose you need to log in to your FB account. For doing so, you need to fill in your username and password in the browser and press the login button or press the enter key on your keyboard. For achieving the same using Selenium Web Driver, you’ll have to select the text box elements of username and password, send the respective keys, which are basically your username and password and then send the enter key command.

All of the necessary commands need to be written using a Selenium supported programming language.


[Note: The preferred way to create applications for Facebook is to create an app using their developer portal. Although, once all your configurations are in place, you need to submit your app for a review on the Facebook platform, which might take several days. Only after successful review of your app, you’ll be able to use it for real. So as a work around, we can use web automation to do the job, since we need to automate a simple feature for learning purposes]


Requirements

  • Your Selenium based web automation script should be able to do the following:

    • Open facebook.com and log in to your account.

    • Open the url for the facebook group or page you’re interested in.

    • Upload a meme image file (which you obtained in the first milestone) to the group or page and post the same.


Useful Tips

  • You may face some challenges when trying to fetch certain web elements. Kindly keep the following in mind:

    • Always first try to find an element using its xpath.

    • If you face issues with xpath, then try to find an element using its id.

    • Suppose a web element is devoid of an id, it may have a class. But the catch with classes is that multiple web elements can have the same class, whereas it is conventional to have elements with unique ids. So, in such a case, you can try to fetch all the elements of a respective class and run a brute force test to find the right web element.

    • Another way to find a web element is to search for text, if the web element has some.

    • In case of searching using classes as parameters, you can do a more verbose search for your web element by combining the class parameter with some text, that is, if the web element contains some text.


Bring it On!

  • Can you add a feature to your script which will be able to accept an image with a written description and post the same, using web automation?

Expected Outcome

By the end of this milestone, you should be able to publish an image post in a Facebook group or page by just running a script.


A sample post by a script in action to give you another dose of joy:

image alt text


Automate periodic meme updates

We need to publish in our FB group/page periodically to keep it lively.


Requirements

  • You need to come up with a script which performs the following actions periodically:

    • Web scraping the required data from Reddit.

    • Downloading the meme image files using the same data.

    • Publish the obtained images in a group/page.


Bring it On!

  • Can you come up with a script which fetches the link of written posts from a subreddit and shares the same in the FB group/page as regular updates?

  • Can you come up with a script which takes a subreddit name and a Facebook group/page as inputs and provides regular updates of popular posts from the subreddit on the respective group/page in the form of text? With a few tweaks, you’ll be able to have an application in place which is easily configurable for any community requesting updates of posts from a subreddit in their FB group/page.


Expected Outcome

By the end of this milestone, you’ll have an application which will be able to provide periodic updates to FB groups/pages by fetching necessary information from a subreddit.


Publish to GitHub

Publish your project in a GitHub repository and have some green goodness!


[Note: Kindly go through this Byte if you’re unfamiliar with Git.]


Deploy the application on a cloud platform

Now that your application is complete, it’s ready to be deployed! Go on and deploy your application on the Google Cloud Platform in a Docker Container.


[Note: You are free to use any Selenium supported cloud provider.]


[PS: If you are new to cloud services, you can go through the QPrep - System Design micro-experience available on the platform before proceeding. Also, if you’re new to Docker, kindly go through the Docker Introduction and Docker Advanced Bytes.]


Requirements

  • Create a Docker container for your application. It will make the deployment easier.

  • Setup a cloud instance on GCP and activate it.

  • Upload your files to the platform. You should simply use your GitHub repository here, since it’ll do the job by a simple git clone.

  • Run your application on the platform.


Expected Outcome

You should be able to deploy the application on a cloud platform.