TDM 20200: Project 11 — 2024
Motivation: Machine learning and AI are huge buzzwords in industry, in this project, we will continue to learn more TensorFlow features.
Context: The purpose of these projects is to give you exposure to machine learning tools, some basic functionality, and a conceptual workflow to create and use a model without needing any special math or statistics background.
Scope: Python, tensorflow, scikit-learn, numpy
Readings and Resources
You need to use 2 cores for your Jupyter Lab session for Project 11 this week. |
You can use |
We added a video (below) to help you with Project 11. BUT the example video is about a data set with beer reviews. You need to (instead) work on the flight data given here: |
Questions
Question 1 (2 points)
In the previous project you created a tensorflow model with limited data. Since it would need large data in order to create a meaningful tensorflow model, your model may not work well! Nonetheless, we can still learn how we can create (and use!) the model, and how to check the performance power of the model.
-
First update your program from Project 10, building the
model
with more data. Please usenrows = 100000
from the data set2014.csv
. The test/training split should again be defined using:
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
and using epochs=10
when training the model.
|
Question 2 (2 points)
-
Read in 100000 lines of data from the
2019.csv
file. -
Save the predicted arrival delays as
predicted_arrival_delays_100k_2019
(or something similar) -
Save the actual arrival delays as
actual_arrival_delays_100k_2019
(or something similar)
Question 3 (2 points)
Solve questions 1 and 2 again, this time using 500000 rows from the 2014 data and 500000 rows from the 2019 data. Be sure to change all of your variable names accordingly.
Question 4 (2 points)
Use the data from question 2 (with 100000 rows of data), to study the predicted arrival delays for 2019 versus the actual arrival delays for 2019. Please comment on what you find.
Be sure to (please) provide some explanation about what you learn, and (likely) some visualizations to justify your work.
Question 5 (2 points)
Now use the data from question 3 (with 500000 rows of data), to study the predicted arrival delays for 2019 versus the actual arrival delays for 2019. Please comment on what you find. Be sure to compare the effectiveness of using 100000 rows of data versus using 500000 rows of data.
Project 11 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and outputs for the assignment
-
firstname-lastname-project11.ipynb
-
-
Python file with code and comments for the assignment
-
firstname-lastname-project11.py
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |