[ Home ] [ Up ]


  1. Introduction: data analysis applications on financial research (index funds), marketing research (market segmentation), medical studies (test of new drugs), mining a database, data warehousing.
  2. Review of Statistics:  Random variables, Mean, Variance, Standard Deviation, Pearson correlation.
  3. Introducing Excel and SPSS: Excel->Tools-->Data Analysis and formulas.
  4. Review Chapter 1 and Chapter 2:
  5. Linear combination of random variables: Its mean and standard deviation.
  6. First Team Project: Find from last five years weekly stock quotes of the 30 Dow component companies.  Create a mutual fund with no more than five stocks from the Dow components that mimic the performance of Dow Jones Index.  You can assume that you have $1,000,000 at the beginning and there is no trade once the portfolio is created and then.  After the initial stage is completed, you can add a criterion regarding when to trade and develop a formula for updating your portfolio.  Use the first three year data to create the portfolio.  After the portfolio is created, test whether the portfolio performance is still close to Dow performance with the last two year data.  Every team member should prepare a time sheet recording what he or she contributed to the project and when the work is completed.  Team members will validate the accuracy of the time sheet.  A team may decide to fire a non-performing member in the early stage.  Final evaluation should be handed in with the project. [Rate the contribution of a team member using a scale from 1 to 5 with 5 being the most valuable player] Check http://finance.yahoo.com 
  7. A few Web Board Conferences have been created for the course.
  8. Assignment:
  • A study comparing the performance of five senior college students' GPA and SAT scores shows the following
Student 1 2 3 4 5
GPA 2.8 3.5 3.6 3.3 3.0
SAT 450 555 710 615 575
Compute the mean and standard deviation of GPA and SAT.  Find the covariance and correlation between GPA and SAT.  There will be a quiz on this subject next week.
  • Read Chapter 3



Why learning Data Analysis?

Covariance Matrix and Multivariate Analysis

Application: Principal Component Analysis, Factor Analysis, Discriminant Analysis, Cluster Analysis

Chapter 3: Fundamental of Data Manipulation

  1. degree of freedom
  2. Mean-corrected data:   mean = 0    d.f = n-1
  3. Standardized data: mean = 0,   standard deviation = 1
  4. Sum of Squares (SS) ,    Sum of Cross Products (SCP),  Sum of Squares and Cross Products (SSCP)
  5. Correlation Coefficient (Pearson Product Moment Correlation)
  6. Covariance Matrix (Variance-Covariance Matrix).  Correlation Matrix, Pooled within-group SSCP (weighted SSCP from all groups to represent the averaged group SSCCP): Symmetric matrices.
  7. Within Group, Between Group Analyses
  8. Example: We will use Excel to calculate all analyses shown above and  also demonstrate the same procedures with SPSS.  The Excel file is named SSCP.xls and is available in my lab folder p:\tsai\quant\ (P drive is also known as faculty drive)

Homework: File dji.xls in p:\tsai\quant\ contains the historical data of dow jones index, American Express, and Wal-Mart from the past year.  There are 52 closing weekly prices for the three stocks/index.

  1. Convert them into 51 percentage change (rate of return per week).
  2. Compute the expected rate of return and risk for the three stocks/index.
  3. Compute the covariance matrix and correlation matrix for the three.
  4. If you invest $100 at the beginning of the one year period in the two stock, 30% in American Express and 70% in Wal-Mart, compute the expected rate of return and expected risk of your portfolio.
  5. Compare the return and risk of the portfolio with those from individual stocks.
  6. Create another column that record the rate of return of your portfolio and compute the correlation between the return of your portfolio and the return of the index.

Review of Linear Algebra: Matrix multiplication,  quadratic form, eigen value, eigen vector.

Review of Statistics: Linear Combinations of random variables

  1. Statistical Distance: independent variables (3.2.1)
  2. Mahalanobis Distance: general statistical distance (Show the slide for bi-variable normal distribution)  in Quadratic Form (3.2.2)