When teaching analysis, I advise my students to start each session with a standard setup. This setup looks like this:
1 2 3 4 5 6 7 8 9 10 11 |
# Clear Memory %reset -f # Set Working Directory import os os.chdir('<em>insert your working directory path here</em>) # Set Random Seed import numpy as np np.random.seed(<em>insert a random seed number here</em>) |
Clear the Memory
Example:
1 |
%reset -f |
This line is used to clear the memory in Jupyter notebooks. It deletes all variables, functions, and imported modules. This step ensures your work from a clean Python session. Doing so ensures that your script does not result in, or rely on, leftover data or definitions from previous sessions.
Set the Working Directory
Example:
1 2 |
import os os.chdir('C:\\Users\\JCohen\\Documents\\Research\\Household Finance Paper') |
Note that the syntax has two slashes, as opposed to a single slash in the Windows OS.
Here, I am setting the directory to the “Household Finance Paper” folder, contained in my “Research” folder, on my Windows device. This will be where Python looks for data and scripts to execute your script, and where Python will save any files that are created during your session. This reduces the hazard of path-related problems with your script.
Load Essential Libraries
Although it is not always necessary, I encourage students to start out by loading up some workhorse Python libraries used to data analysis:
- Numpy (numpy): Essential for numerical computations, especially those involving arrays and matrices. It’s often used in the background by other libraries for performing efficient numerical operations.
- Pandas (pandas): Indispensable for data manipulation and cleaning. It introduces DataFrame and Series data structures that are ideal for handling and analyzing structured data.
- Matplotlib (matplotlib): A foundational plotting library, useful for creating a wide range of static, animated, and interactive visualizations.
- Seaborn (seaborn): Builds on Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics.
To install it:
1 2 3 4 5 |
pip install numpy pip install pandas pip install matplotlib pip install seaborn |
To call up the libraries in your session:
1 2 3 4 5 |
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns |
Set Random Seed
Many statistical operations rely on randomly generated numbers. This makes replication difficult, because random numbers inject randomness into an analysis. When you set a random seed, you are effectively calling up a predetermined list of random numbers associated with a seed number. If two analysts set the same seed, then their analysis will use the same random numbers. The analysis should replicate.
To set a random seed in numpy, which sets the seed at 55. You can choose any number:
1 2 3 |
np.random.seed(55) |