Task 1 ( 22nd SEP 2023 )
Task-1
Following are the few subtasks that will help you run through the python essentials for getting started with AI and ML.
Q1
File Handling is one of the basic important task when it comes to building machine learning models or neural networks. Building a good model always starts with finding datasets and processing it, for which, file handling acts as a stepping stone.
Write a python program that reads the contents from the given file ‘onelinefile.txt’. The file contains a single line which is of the format (int)(string)(float)(string) repeatedly. For e.g.
1Aaa3.5Maths2Bbb4.2Physics3Ccc7.62Chemistry |
Your main task is to split the contents of the given file based on their format and write it into a .csv file say ‘Filename2.csv’. For e.g. the above txt file should be converted into a csv file such that the contents look like this:
1,Aaa,3.5,Maths |
Contents of ‘onelinefile.txt’
1Aaa3.5Maths2Bbb4.2Physics3Ccc7.62Chemistry4Ddd9.55Biology5Eee4.0Social6Fff7.6English7Ggg3.111Maths8Hhh9.99Physics9Iii1.23Civics |
Q2
Data formatting
Python libraries represent missing numbers as nan which is short for “not a number”. Most libraries (including scikit-learn) will give you an error if you try to build a model using data with missing values. One of the common solution to get around this issue is to impute or fill in the missing value with a number or value of same format. From the given dataset, find the missing values(Nan/NA/-/Nil) and change those values into an appropriate number.
Q3
Read the file ‘about.txt’ and find the words with atleast 6 letters and the most frequently used word.
Contents of the file ‘about.txt’:
Python has tools for almost every aspect of scientific computing. The Bank of America uses Python to crunch its financial data and Facebook looks upon the Python library Pandas for its data analysis. While there are many libraries available to perform data analysis in Python, here are a few: NumPy, SciPy, Pandas and Matplotlib. |