- Introduction: data analysis applications on financial research (index
funds), marketing research (market segmentation), medical studies (test of
new drugs), mining a database, data warehousing.
- Review of Statistics: Random variables, Mean, Variance, Standard
Deviation, Pearson correlation.
- Introducing Excel and SPSS: Excel->Tools-->Data Analysis and
- Review Chapter 1 and Chapter 2:
- Linear combination of random variables: Its mean and standard deviation.
- First Team Project: Find from last five years weekly stock quotes of the 30 Dow
component companies. Create a mutual fund with no more than five
stocks from the Dow components that mimic the performance of Dow
Jones Index. You can assume that you have $1,000,000 at the beginning
and there is no trade once the portfolio is created and then. After
the initial stage is completed, you can add a criterion regarding when to
trade and develop a formula for updating your portfolio. Use the first
three year data to create the portfolio. After the portfolio is
created, test whether the portfolio performance is still close to Dow
performance with the last two year data. Every team member should
prepare a time sheet recording what he or she contributed to the project and
when the work is completed. Team members will validate the accuracy of
the time sheet. A team may decide to fire a non-performing member in
the early stage. Final evaluation should be handed in with the
project. [Rate the contribution of a team member using a scale from 1 to 5
with 5 being the most valuable player] Check http://finance.yahoo.com
- A few Web Board Conferences have been created for the course.
- A study comparing the performance of five senior college students' GPA and
SAT scores shows the following
Compute the mean and standard deviation of GPA and SAT. Find
the covariance and correlation between GPA and SAT. There
will be a quiz on this subject next week.
Why learning Data Analysis?
Covariance Matrix and Multivariate Analysis
Application: Principal Component Analysis, Factor Analysis, Discriminant
Analysis, Cluster Analysis
Chapter 3: Fundamental of Data Manipulation
- degree of freedom
- Mean-corrected data: mean = 0 d.f = n-1
- Standardized data: mean = 0, standard deviation = 1
- Sum of Squares (SS) , Sum of Cross Products (SCP),
Sum of Squares and Cross Products (SSCP)
- Correlation Coefficient (Pearson Product Moment Correlation)
- Covariance Matrix (Variance-Covariance Matrix). Correlation
Matrix, Pooled within-group SSCP (weighted SSCP from all groups to
represent the averaged group SSCCP): Symmetric matrices.
- Within Group, Between Group Analyses
- Example: We will use Excel to calculate all analyses shown above
and also demonstrate the same procedures with SPSS. The Excel
file is named SSCP.xls and is available in my lab folder p:\tsai\quant\
(P drive is also known as faculty drive)
Homework: File dji.xls in p:\tsai\quant\
contains the historical data of dow jones index, American
Express, and Wal-Mart from the past year. There are 52 closing weekly
prices for the three stocks/index.
- Convert them into 51 percentage change (rate of return per week).
- Compute the expected rate of return and risk for the three
- Compute the covariance matrix and correlation matrix for the three.
- If you invest $100 at the beginning of the one year period in the two
stock, 30% in American Express and 70% in Wal-Mart, compute the expected
rate of return and expected risk of your portfolio.
- Compare the return and risk of the portfolio with those from individual
- Create another column that record the rate of return of your portfolio
and compute the correlation between the return of your portfolio and the
return of the index.
Review of Linear Algebra: Matrix multiplication, quadratic form, eigen
value, eigen vector.
Review of Statistics: Linear Combinations of random variables
- Statistical Distance: independent variables (3.2.1)
- Mahalanobis Distance: general statistical distance (Show the slide for
bi-variable normal distribution) in Quadratic Form (3.2.2)