Crio Bytes - Linux Basics 2 | Crio.Do | Project-Based Learning Platform for Developers

Objective

Dive into the Linux terminal and learn the skills that are needed by a developer on a daily basis.

Background

Software developers don’t write new code all day. It’s just as important to troubleshoot issues & understand how the product behaves. This requires being able to monitor the product while it’s running, understanding capabilities of the system it’s running on, working knowledge of how applications talk to other systems over the internet etc.

Primary goals

Use Linux commands to monitor processes
Use Linux commands to retrieve system info
Use Linux commands to understand basic networking
Perform useful tasks using shell scripts

Objective

Dive into the Linux terminal and learn the skills that are needed by a developer on a daily basis.

Background

Primary goals

Use Linux commands to monitor processes
Use Linux commands to retrieve system info
Use Linux commands to understand basic networking
Perform useful tasks using shell scripts

Process, process, process…

Processes are sets of instructions currently being run by the Operating System (OS). Each of the applications we interact with like the browser, file explorer etc will be one or more processes.

Let’s start a process and explore its properties. We’ll use the Netcat utility to start a server process. A server is just a process running on a computer that’s open for communication with someone else. For example, Google’s server is ready to get our search keyword & return us results for it.

To start the server using Netcat, use nc -v -l 8081. We’ll see what the command means in Milestone 3. Now, we’ll check the properties of this newly created Netcat server process using some commands. Hit Ctrl + z to suspend the process, so that we’ll get the terminal prompt back for running other commands.

image alt text

To identify each running process, OS tags them by a unique identifier called Process ID (or PID). We use the ps command to get process related info. It prints out all the processes currently running from that terminal.

image alt text

So, our netcat server which we started with the nc command is having a PID of 16496.

Every process will have a parent process and ps provides us with the process id of the parent process via the PPID attribute. We’ll need to do a full-format listing of ps output to see that. Can you find out how to do that? (Use the manual page for ps or Google)

Answer the first question provided at the end of the milestone.

Checking resource usage

Processes need system resources like CPU, Physical memory etc. to stay "alive". Monitoring resource usage of processes helps us troubleshoot issues like the system being slow. top command gives us a dynamic view of the current state of system resource utilization by the processes

image alt text

Find out from the above top command output:

Amount of free memory available
Total number of processes
Time for which the system has been up & running
Actual physical memory consumed by the process named redis-server

It’d be hard for us to find our Netcat process from the long list. top command lets us filter its output by the PID of a process. Use top -p <pid> to find the resource usage of the Netcat process (substitute with PID of the Netcat process).

We don’t want to let a process run unnecessarily to avoid wasting system resources. To stop running a process, we can use the kill command. We had earlier suspended the Netcat process, so it is still in memory and needs to be killed. Find out how to use the kill command and verify the process was killed by checking ps output.

References

Curious Cats

What if we need to list out all the processes running on the machine and not just a given terminal? How’d we do that using ps?
Where is the top command getting all this information from?
Hitting Ctrl + z puts the process into a suspended mode (sleep). How would you bring it back to the foreground (resume)?
How can you specify that a process should run in the background when you’re starting it up?

Can our system support GTA 5?

The minimum system requirement for running GTA 5 are


CPU - 2500 MHz 

RAM - 4GB

Free Disk Space - 65GB

image alt text

Let’s check if our system can support GTA 5. Fingers crossed!

How would you check the clock speed of a processor? You wouldn’t happen to have carefully saved the cardboard box your PC/Laptop came in 3 years back, right?

Don’t worry, Linux has you covered. Linux stores CPU related information in the /proc/cpuinfo file. Try cating out its contents. You’ll be able to see the clock speed listed as the "cpu MHz" attribute. There can be more than one “cpu MHz” entry if the processor is multi-core.

image alt text

Is there some Linux command that can fetch us the CPU info directly?

Similarly, we have memory related information inside another file in the /proc directory itself. Try listing out the contents of the /proc directory and find the file (Hint: grep output of ls for info). What’s the total memory of your system as shown in that file?

free command gives us a quick summary of the memory usage, try it out!

We’re down to only finding out the free disk space. Check the free disk space for the disk mounted at the root directory "/". Can you Google for some command that does this for us?

What happened? Does your system have the juice to run GTA 5?

References

Curious Cats

Print out the integer part of "cpu MHz" value in /proc/cpuinfo using Linux commands ie, if value is 2499.99 only print 2499. Limit output to one integer if multi-core. (Hint: Use combination of grep, awk etc)
We looked at the files storing processor & memory info. Where does Linux store data related to the Linux distribution (Ubuntu, Fedora etc) & version?
What commands can you use to install new software on a Linux machine?

Knock, knock… Anyone there!

IP addresses are unique identifiers by which computers/machines are recognized in the networking world. This helps in passing data to & from your machine to other machines on the network. Ok, got it, IP addresses are for computers what Aadhar is for Indians.

For your computer to communicate to the outside world, it needs an IP address. Find your machine’s IP address. Can you use the ifconfig command here?

We have multiple applications/processes running on our machines. When new requests are sent from an external browser to your web server, how does your OS know to pass it to the web server process?

Port number is how the OS knows which application (process) to send specific network traffic to. IP address & port number together uniquely identifies a process running on a computer.

Start the Netcat server process again using nc -v -l 8081. Verify using the netstat command that there’s indeed a process listening on port 8081.

Open a new terminal window. If you’re on Crio workspace, open a new terminal side-by-side to the current one by clicking on the icon next to the + icon to the top-right of the terminal.

image alt text

From the new terminal, execute nc 0.0.0.0 8081. You’ll see some messages printed out on the server side. Try entering some text and hitting enter on either side.

You just created a ready-made client-server based chat application!

Hope you had a good time chatting with yourself :)

Let’s try to understand what just happened.

image alt text

The above command tells Netcat utility to start listening to network traffic coming to the IP address 0.0.0.0 and port 8081 (acting as server). This is where we are connecting to when nc 0.0.0.0 8081 is run (acting as client connecting to the listening server).

When we enter https://www.google.com on our browser, this gets resolved to Google server’s IP address & from the protocol used (https), the port number is found to be 443. Find the IP address of https://www.google.com (can use ping command) & try entering the IP address and port in your browser’s URL bar like this IPaddress:port ie, if IP address is 10.0.2.15, visit 10.0.2.15:443. Where did it take you?

So similarly if you visit 0.0.0.0:8081 (where the Netcat server is listening) from your browser, are you able to see any messages on the server side? Why?

(Our current server is a bit shy & can only talk to one client. You’ll have to kill the Netcat server & restart to connect to a new client every time)

Public IP addresses are unique. You can find the public IP of your workspace on the top-right in the workspace tab.

image alt text

How would you know for sure if the workspace is indeed reachable at this IP address from the outside?

ping is a network utility tool and helps to check if a particular IP address is reachable. Ping your workspace’s public IP address from your local system’s command line client or terminal, while keeping sudo tcpdump -i any icmp running on the workspace. tcpdump is a command that captures packets. With this command, it will print out any ping packets your workspace receives. Did you see ping packets being received on your workspace as shown by tcpdump?

What could you make out from the data printed by tcpdump? Find out your local system’s IP address from here and do the reverse (run ping from workspace & tcpdump on your local machine). Use ifconfig on both sides to understand the IP addresses involved.

References

Curious Cats

How does the URL https://www.google.com get resolved to its corresponding IP address?
Start the Netcat server, what happens when you try to connect via your browser? (Hint: visit IP:port eg: 3.7.32.15:8081 where 3.7.32.15 is IP address & 8081 is listening port)
Use tcpdump to secretly capture the data being exchanged in the chat between the Netcat server & client.
Can you make the Netcat server not stop listening when a client closes its connection?

Getting started with shell scripting

I was asked by a client to do some basic analysis of a Python file, user_preference.py. The requirements are to:

Print out the total number of lines in the file
Print out the number of comment lines in the file (starting with #)

Save the file to the ~/workspace/bytes/me_linux2 directory using this command


mkdir -p ~/workspace/bytes/me_linux2/

cd ~/workspace/bytes/me_linux2/

curl -O https://gitlab.crio.do/crio_bytes/me_encapsulation/-/raw/9c845bb7364b64fada328e44b75def128388ffe5/user_preference.py

Can you help me with commands to do these? You can answer the first two questions provided below, I’ll fetch it from there :)

Thanks! The client was impressed and will be sending all of his Python files to us for the analysis. This time we are asked to find more metrics apart from the two we did earlier.

That’s great news, but there are a couple of issues.

We’ll have one command each to find a metric like total number of lines, number of comment lines etc. Every time we need to do an analysis, we’ll need to remember them & type it out one by one
Also, we’ve hardcoded the filename, user_preference.py in all our commands. Every time we are analysing a different file, the filename should be changed in multiple places

Anytime you come across scenarios asking to perform seemingly similar tasks repetitively, think about automation.

Luckily for us, the terminal supports running multiple commands in sequence by reading them from a file. These files are called shell scripts as they store shell (or terminal) commands. So, if we store all our commands inside a file, issue no. 1 will be resolved. Create a new file script.sh inside the ~/workspace/bytes/me_linux2 directory. Add your commands for finding total number of lines & number of comment lines into it, each on a new line. Run the script.sh file & you’ll see the output printed out.

1 down, 1 more issue to solve!

One way to solve the filename hardcoded issue will be to store the filename within the script.sh file as a variable like we do in programming languages and use the variable name instead in the commands instead of hardcoding the filename. That way, we’d just need to change the filename in one place.

Rewrite your script.sh file so that it stores the filename in a variable and then uses the variable name instead of hardcoding the filename in all the commands. Answer the third question given.

This solves our problem. But, we still need to open our script.sh file using some editor, remove the earlier filename & type in the new filename. Can we do better?

Shell scripts support taking parameters from the terminal itself. Which means we can do something like ./script.sh user_preference.py to run the shell script & use the value passed as a command line parameter i.e. user_preference.py from within the file. Use $1 inside the script as an alias to the filename being passed as a parameter. This is because it’s the first parameter (and the only one in our case) we pass to the script.

Rewrite your script.sh file so that it prints out analysis of any Python file when run as ./script.sh <filename>

References

Curious Cats

Additional requirements from the client came in by mail this morning. Can you add functionality to - (1) print out the ratio of comment lines to total number of lines, (2) print "Well commented code" if the ratio > 0.25 or “Poorly commented code” otherwise
We used $1 to denote the first parameter passed when executing the shell script. Print out the name of the shell script ie, script.sh when it’s run. (Nope, hardcoding not allowed!)
What if we needed to delete all the commented lines from the Python file, how would we do that?

Environment variables

I have a directory that I often visit and have an absolute path, /home/crio-user/Downloads/videos/series/english/series1/season100/episode1. It’s quite tedious typing this out every time I need to do an ls of the directory or cd to the directory. If there’s one thing that we should keep in mind being a developer, there’s always an easy way to do things or if not, you have a potential pet-project to work on :)

Ok, Linux has environment variables that store important information as variables. We can use the printenv command to print those out. See if you can find an environment variable named PWD?

Does the PWD environment variable value change if we move to a different directory?

We can print out environment variable values using echo $<variable-name>. Try to print out the value of the HOME environment variable.

Cool stuff. But, why did we get started with environment variables in the first place? Ah, I remembered! We were trying to find an easier way to type in our long directory name. Can you store the directory name above as an environment variable named SERIES1S100E1?

Are you able to see SERIES1S100E1 in printenv output now?

PATH

Quick question - How does Linux know ls should print out the contents of the current directory but shows a command not found error when we execute some random command like jkdfjsdjkjdk? Where did Linux find the ls command?

PATH is an environment variable containing a list of directories. This tells the shell where to search for executables when the user inputs a command. See if you can find the ls command file in any of the directories in the PATH variable.

Remember how we executed our script.sh file earlier? We used ./script.sh, right? What happens if you execute just script.sh, does it run?

Add the directory in which script.sh file is present to the PATH variable. Try to execute script.sh now.

References

Curious Cats

Where does the printenv command get its data from? Is it stored in a single file like /proc/cpuinfo?
The environment variable you created isn’t persisted across system restarts. How can you solve this problem?
Your friend said that using the alias command instead of creating an environment variable is better, would you agree?

Additional Information

You will observe that there are 2 processes for nc when it is started with sudo. This is because sudo creates a child process for any command.
ps command displays the process id(PID) and parent process id (PPID) for each process.
Ctrl + C sends a kill signal to the process which is running in the foreground.
Using & at the end of the command would run the process in the background.
kill -9 sends the kill signal to the process running in the background by using the PID.
In the top output, CPU% for a process can show > 100%. This is because each CPU is considered 100% and the server may have more than one CPU.
The process with id 1 (init) is the parent of all other processes in linux.
cd - is like the back button on the tv remote. It takes you back to the previous directory you were working in.
alias - to create shortcuts for long and frequently used commands

Newfound Superpowers

Practical know-how of Linux details that developers need on a regular basis
Ability to utilize Linux terminal to accomplish various tasks

Now you can

Monitor running processes
Explain system resources and how to get them on Linux
Comprehend how computers talk to each other in a better way
Use shell scripts to perform complex tasks

Byte Introduction

Objective

Background

Primary goals

Objective

Background

Primary goals

Process, process, process…

Checking resource usage

References

Curious Cats

Can our system support GTA 5?

References

Curious Cats

Knock, knock… Anyone there!

References

Curious Cats

Getting started with shell scripting

References

Curious Cats

Environment variables

PATH

References

Curious Cats

Additional Information

Newfound Superpowers

Now you can