Theory Questions
What is the purpose of data loading in the context of data analysis?
Explain the difference between structured and unstructured data.
What factors should you consider when choosing a data storage solution for a project?
What are the advantages of using a columnar storage format like Parquet over a row-based format like CSV?
Name a scenario where using JSON as a file format might be advantageous.
How does the schema evolve in a schema-on-read system compared to a schema-on-write system?
Describe the purpose of data serialization and provide an example format used for serialization.
Why might you choose SQLite as a data storage solution for a small-scale application?
What is the purpose of an ETL (Extract, Transform, Load) process in data engineering?
In a distributed file system, what are the benefits of data partitioning and replication?
Data Loading & Format PPT
Programming Questions
How can you read data from a CSV file in Python?
What Python library is commonly used to work with JSON data?
How can you store data in a Pandas DataFrame?
Explain how you can write data to a JSON file using Python.
What is Pickle in Python, and how can you use it to serialize objects?
How can you read data from an SQLite database in Python?
How can you read Avro data in Python using the fastavro
library?
How can you store a NumPy array in an HDF5 file using the h5py
library?
How can you use the requests
library to fetch data from a RESTful API?
How can you handle data compression and decompression using the gzip
module in Python?
Output Question
Write a Python program to read the contents of a text file named “data.txt” and print them to the console.
Create a Python program that takes user input and writes it to a new text file named “output.txt”.
Develop a program in Python that reads a CSV file named “data.csv” and calculates the sum of numbers in a specific column.
Write a Python function that takes a dictionary and saves it as a JSON file named “data.json”
Create a Python program that reads a JSON file named “data.json” and prints the values of a specific key.
Implement a Python function that serializes an object using the pickle
module and saves it to a file named “object.pkl”.
Write a Python program that reads a binary file named “binary_data.bin” and prints the hexadecimal representation of the first 16 bytes.
Develop a Python script that reads an XML file named “data.xml” and prints the text content of specific elements.
Create a Python program that compresses a text file named “large_file.txt” using the gzip
module
Write a Python function that accepts a list of dictionaries and saves it to an Excel file named “data.xlsx” using the openpyxl
library.