pca в R с помощью princomp() и с использованием svd()

Возможный дубликат:
Сравнение svd и princomp в R< /а>

Как выполнить PCA, используя 2 метода (princomp() и svd матрицы корреляции) в R

У меня есть набор данных, например:

438,498,3625,3645,5000,2918,5000,2351,2332,2643,1698,1687,1698,1717,1744,593,502,493,504,445,431,444,440,429,10
438,498,3625,3648,5000,2918,5000,2637,2332,2649,1695,1687,1695,1720,1744,592,502,493,504,449,431,444,443,429,10
438,498,3625,3629,5000,2918,5000,2637,2334,2643,1696,1687,1695,1717,1744,593,502,493,504,449,431,444,446,429,10
437,501,3625,3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496,490,498,462,444,458,461,449,20
465,525,3646,3670,5000,2923,5000,2611,2315,2631,1674,1658,1666,1688,735,593,495,488,497,467,449,462,469,457,20
473,533,3652,3676,5000,2925,5000,2607,2310,2623,1669,1651,1659,1684,729,578,496,487,498,469,454,467,476,465,20
481,544,3658,3678,5000,2926,5000,2606,2303,2619,1668,1643,1651,1275,723,581,495,486,497,477,459,472,484,472,20
484,544,3661,3665,5000,2928,5000,2321,2304,5022,1647,1639,1646,1270,757,623,493,484,495,480,461,474,485,476,20
484,532,3669,3662,2945,2926,5000,2326,2306,2620,1648,1639,1646,1270,760,533,493,483,494,507,461,473,486,476,20
482,520,3685,3664,2952,2927,5000,2981,2307,2329,1650,1640,1644,1268,757,533,492,482,492,513,459,474,485,474,20
481,522,3682,3661,2955,2927,2957,2984,1700,2622,1651,1641,1645,1272,761,530,492,482,492,513,462,486,483,473,20
480,525,3694,3664,2948,2926,2950,2995,1697,2619,1651,1642,1646,1269,762,530,493,482,492,516,462,486,483,473,20
481,515,5018,3664,2956,2927,2947,2993,1697,2622,1651,1641,1645,1269,765,592,489,482,495,531,462,499,483,473,20
479,5000,3696,3661,2953,2927,2944,2993,1702,2622,1649,1642,1645,1269,812,588,489,481,491,510,462,481,483,473,20
480,506,5019,3665,2941,2929,2945,2981,1700,2616,1652,1642,1645,1271,814,643,491,480,493,524,461,469,484,473,20
479,5000,5019,3661,2943,2930,2942,2996,1698,2312,1653,1642,1644,1274,811,617,491,479,491,575,461,465,484,473,20
479,5000,5020,3662,2945,2931,2942,2997,1700,2313,1654,1642,1644,1270,908,616,490,478,489,503,460,460,478,473,10
481,508,5021,3660,2954,2936,2946,2966,1705,2313,1654,1643,1643,1270,1689,678,493,477,483,497,467,459,476,473,10
486,510,522,3662,2958,2938,2939,2627,1707,2314,1659,1643,1639,1665,1702,696,516,476,477,547,465,457,470,474,10
479,521,520,3663,2954,2938,2941,2957,1712,2314,1660,1643,1638,1660,1758,688,534,475,475,489,461,456,465,474,10
480,554,521,3664,2954,2938,2941,2632,1715,2313,1660,1643,1637,1656,1761,687,553,475,474,558,462,453,465,476,10
481,511,5023,3665,2954,2937,2941,2627,1707,2312,1660,1641,1636,1655,1756,687,545,475,475,504,463,458,470,477,10
482,528,524,3665,2953,2937,2940,2629,1706,2312,1657,1640,1635,1654,1756,566,549,475,476,505,464,459,468,477,10

Итак, я делаю это:

x <- read.csv("C:\\data_25_1000.txt",header=F,row.names=NULL)
p1 <- princomp(x, cor = TRUE)  ## using correlation matrix
p1
Call:
princomp(x = x, cor = TRUE)

    Standard deviations:
       Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6    Comp.7    Comp.8    Comp.9   Comp.10   Comp.11   Comp.12   Comp.13   Comp.14   Comp.15   Comp.16 
    1.9800328 1.8321498 1.4147367 1.3045541 1.2016116 1.1708212 1.1424120 1.0134829 1.0045317 0.9078734 0.8442308 0.8093044 0.7977656 0.7661921 0.7370972 0.7075442 
      Comp.17   Comp.18   Comp.19   Comp.20   Comp.21   Comp.22   Comp.23   Comp.24   Comp.25 
    0.7011462 0.6779179 0.6671614 0.6407627 0.6077336 0.5767217 0.5659030 0.5526520 0.5191375 

     25  variables and  1000 observations. 

Для второго метода предположим, что у меня есть корреляционная матрица "C:\data_25_1000.txt", которая выглядит следующим образом:

1.0     0.3045  0.1448  -0.0714 -0.038  -0.0838 -0.1433 -0.1071 -0.1988 -0.1076 -0.0313 -0.157  -0.1032 -0.137  -0.0802 0.1244  0.0701  0.0457  -0.0634 0.0401 0.1643  0.3056  0.3956  0.4533  0.1557
0.3045  0.9999  0.3197  0.1328  0.093   -0.0846 -0.132  0.0046  -0.004  -0.0197 -0.1469 -0.1143 -0.2016 -0.1    -0.0316 0.0044  -0.0589 -0.0589 0.0277  0.0314  0.078   0.0104  0.0692  0.1858  0.0217
0.1448  0.3197  1       0.3487  0.2811  0.0786  -0.1421 -0.1326 -0.2056 -0.1109 0.0385  -0.1993 -0.1975 -0.1858 -0.1546 -0.0297 -0.0629 -0.0997 -0.0624 -0.0583 0.0316  0.0594  0.0941  0.0813  -0.1211
-0.0714 0.1328  0.3487  1       0.6033  0.2866  -0.246  -0.1201 -0.1975 -0.0929 -0.1071 -0.212  -0.3018 -0.3432 -0.2562 0.0277  -0.1363 -0.2218 -0.1443 -0.0322 -0.012  0.1741  -0.0725 -0.0528 -0.0937
-0.038  0.093   0.2811  0.6033  1       0.4613  0.016   0.0655  -0.1094 0.0026  -0.1152 -0.1692 -0.2047 -0.2508 -0.319  -0.0528 -0.1839 -0.2758 -0.2657 -0.1136 -0.0699 0.1433  -0.0136 -0.0409 -0.1538
-0.0838 -0.0846 0.0786  0.2866  0.4613  0.9999  0.2615  0.2449  0.1471  0.0042  -0.1496 -0.2025 -0.1669 -0.142  -0.1746 -0.1984 -0.2197 -0.2631 -0.2675 -0.1999 -0.1315 0.0469  0.0003  -0.1113 -0.1217
-0.1433 -0.132  -0.1421 -0.246  0.016   0.2615  1       0.3979  0.3108  0.1622 -0.0539 0.0231  0.1801  0.2129  0.1331  -0.1325 -0.0669 -0.0922 -0.1236 -0.1463 -0.1452 -0.2422 -0.0768 -0.1457 0.036
-0.1071 0.0046  -0.1326 -0.1201 0.0655  0.2449  0.3979  1       0.4244  0.3821 0.119   -0.0666 0.0163  0.0963  -0.0078 -0.1202 -0.204  -0.2257 -0.2569 -0.2334 -0.234  -0.2004 -0.138  -0.0735 -0.1442
-0.1988 -0.004  -0.2056 -0.1975 -0.1094 0.1471  0.3108  0.4244  0.9999  0.5459 0.0498  -0.052  0.0987  0.186   0.2576  -0.052  -0.1921 -0.2222 -0.1792 -0.0154 -0.058  -0.1868 -0.2232 -0.3118 0.0186
-0.1076 -0.0197 -0.1109 -0.0929 0.0026  0.0042  0.1622  0.3821  0.5459  0.9999 0.2416  0.0183  0.063   0.0252  0.186   0.0519  -0.1943 -0.2241 -0.2635 -0.0498 -0.0799 -0.0553 -0.1567 -0.2281 -0.0263
-0.0313 -0.1469 0.0385  -0.1071 -0.1152 -0.1496 -0.0539 0.119   0.0498  0.2416 1       0.2601  0.1625  -0.0091 -0.0633 0.0355  0.0397  -0.0288 -0.0768 -0.2144 -0.2581 0.1062  0.0469  -0.0608 -0.0578
-0.157  -0.1143 -0.1993 -0.212  -0.1692 -0.2025 0.0231  -0.0666 -0.052  0.0183 0.2601  0.9999  0.3685  0.3059  0.1269  -0.0302 0.1417  0.1678  0.2219  -0.0392 -0.2391 -0.2504 -0.2743 -0.1827 -0.0496
-0.1032 -0.2016 -0.1975 -0.3018 -0.2047 -0.1669 0.1801  0.0163  0.0987  0.063 0.1625  0.3685  1       0.6136  0.2301  -0.1158 0.0366  0.0965  0.1334  -0.0449 -0.1923 -0.2321 -0.1848 -0.1109 0.1007
-0.137  -0.1    -0.1858 -0.3432 -0.2508 -0.142  0.2129  0.0963  0.186   0.0252 -0.0091 0.3059  0.6136  1       0.4078  -0.0615 0.0607  0.1223  0.1379  0.0072 -0.1377 -0.3633 -0.2905 -0.1867 0.0277
-0.0802 -0.0316 -0.1546 -0.2562 -0.319  -0.1746 0.1331  -0.0078 0.2576  0.186 -0.0633 0.1269  0.2301  0.4078  1       0.0521  -0.0345 0.0444  0.0778  0.0925 0.0596  -0.2551 -0.1499 -0.2211 0.244
0.1244  0.0044  -0.0297 0.0277  -0.0528 -0.1984 -0.1325 -0.1202 -0.052  0.0519 0.0355  -0.0302 -0.1158 -0.0615 0.0521  1       0.295   0.2421  -0.06   0.0921 0.243   0.0953  0.0886  0.0518  -0.0032
0.0701  -0.0589 -0.0629 -0.1363 -0.1839 -0.2197 -0.0669 -0.204  -0.1921 -0.1943 0.0397  0.1417  0.0366  0.0607  -0.0345 0.295   0.9999  0.4832  0.2772  0.0012 0.1198  0.0411  0.1213  0.1409  0.0368
0.0457  -0.0589 -0.0997 -0.2218 -0.2758 -0.2631 -0.0922 -0.2257 -0.2222 -0.2241 -0.0288 0.1678  0.0965  0.1223  0.0444  0.2421  0.4832  1       0.2632  0.0576 0.0965  -0.0043 0.0818  0.102   0.0915
-0.0634 0.0277  -0.0624 -0.1443 -0.2657 -0.2675 -0.1236 -0.2569 -0.1792 -0.2635 -0.0768 0.2219  0.1334  0.1379  0.0778  -0.06   0.2772  0.2632  1       0.2036 -0.0452 -0.142  -0.0696 -0.0367 0.3039
0.0401  0.0314  -0.0583 -0.0322 -0.1136 -0.1999 -0.1463 -0.2334 -0.0154 -0.0498 -0.2144 -0.0392 -0.0449 0.0072  0.0925  0.0921  0.0012  0.0576  0.2036  0.9999 0.2198  0.1268  0.0294  0.0261  0.3231
0.1643  0.078   0.0316  -0.012  -0.0699 -0.1315 -0.1452 -0.234  -0.058  -0.0799 -0.2581 -0.2391 -0.1923 -0.1377 0.0596  0.243   0.1198  0.0965  -0.0452 0.2198 1       0.2667  0.2833  0.2467  0.0288
0.3056  0.0104  0.0594  0.1741  0.1433  0.0469  -0.2422 -0.2004 -0.1868 -0.0553 0.1062  -0.2504 -0.2321 -0.3633 -0.2551 0.0953  0.0411  -0.0043 -0.142  0.1268 0.2667  1       0.4872  0.3134  0.1663
0.3956  0.0692  0.0941  -0.0725 -0.0136 0.0003  -0.0768 -0.138  -0.2232 -0.1567 0.0469  -0.2743 -0.1848 -0.2905 -0.1499 0.0886  0.1213  0.0818  -0.0696 0.0294 0.2833  0.4872  0.9999  0.4208  0.1317
0.4533  0.1858  0.0813  -0.0528 -0.0409 -0.1113 -0.1457 -0.0735 -0.3118 -0.2281 -0.0608 -0.1827 -0.1109 -0.1867 -0.2211 0.0518  0.1409  0.102   -0.0367 0.0261 0.2467  0.3134  0.4208  1       0.0592
0.1557  0.0217  -0.1211 -0.0937 -0.1538 -0.1217 0.036   -0.1442 0.0186  -0.0263 -0.0578 -0.0496 0.1007  0.0277  0.244   -0.0032 0.0368  0.0915  0.3039  0.3231 0.0288  0.1663  0.1317  0.0592  0.9999

Я также вычислил svd этой корреляционной матрицы и получил:

> s = svd(Correlation_25_1000)
$d
 [1] 3.9205298 3.3567729 2.0014799 1.7018614 1.4438704 1.3708223 1.3051053 1.0271475 1.0090840 0.8242341 0.7127256 0.6549736 0.6364299 0.5870503 0.5433123 0.5006188 0.4916060
[18] 0.4595726 0.4451043 0.4105769 0.3693401 0.3326079 0.3202462 0.3054243 0.2695037

$u

matrix

$v

matrix

Мой вопрос: как я могу использовать $d, $u и $v для получения основных компонентов Могу ли я использовать prcomp() ?? Если, то как?


person edgarmtze    schedule 10.10.2011    source источник
comment
Я указал вам на код в prcomp в своем ответе на этот вопрос. Кроме того, постарайтесь использовать инструменты форматирования кода, а не просто копировать и вставлять фрагменты кода.   -  person joran    schedule 10.10.2011
comment
хорошо, я сделал stats:::prcomp.default и вижу функцию, но как использовать вывод svd, чтобы получить pca в R??   -  person edgarmtze    schedule 10.10.2011
comment
Но... та функция, на которую вы смотрите, это как использовать вывод svd для выполнения PCA!   -  person joran    schedule 10.10.2011


Ответы (1)


Попробуй это

принкомп

princomp(USArrests, cor = TRUE)$loadings
Loadings:
         Comp.1 Comp.2 Comp.3 Comp.4
Murder   -0.536  0.418 -0.341  0.649
Assault  -0.583  0.188 -0.268 -0.743
UrbanPop -0.278 -0.873 -0.378  0.134
Rape     -0.543 -0.167  0.818       

СВД

svd(cor(USArrests))$u
       [,1]       [,2]       [,3]        [,4]
[1,] -0.5358995  0.4181809 -0.3412327  0.64922780
[2,] -0.5831836  0.1879856 -0.2681484 -0.74340748
[3,] -0.2781909 -0.8728062 -0.3780158  0.13387773
[4,] -0.5434321 -0.1673186  0.8177779  0.08902432

собственный

eigen(cor(USArrests))$vectors
          [,1]       [,2]       [,3]        [,4]
[1,] -0.5358995  0.4181809 -0.3412327  0.64922780
[2,] -0.5831836  0.1879856 -0.2681484 -0.74340748
[3,] -0.2781909 -0.8728062 -0.3780158  0.13387773
[4,] -0.5434321 -0.1673186  0.8177779  0.08902432

Для матрицы cor все princomp, svd и eigen дают одинаковые результаты.

person MYaseen208    schedule 10.10.2011