数据科学 & 机器学习技术分享交流

刘晶

2019.6.15

If you cannot explain something in simple terms, you don’t understand it youself —- Richard Feynman

说明

一、

涉及一些内部资料,已删除,请谅解哦

二、很多人眼里的数据科学与机器学习

个人观点:广义上,数据科学算法层不纠结于是否是机器学习算法,比如对一些数据的分析可以用到非机器学习领域的复杂系统科学、时间序列分析(GARCH、ARIMA等)、计量经济学等

其中数学与统计主要包括高等数学、概率论、统计学、线性代数、矩阵论、最优化理论、运筹学、决策论、图论、微分几何代数拓扑

时代的发展重新让人们切实认识到数学和统计学以及与计算机科学融合交叉的重要性

⚠️:粉色标注部分的学科本人并未学习过,所以没有任何深入理解,在此引入是参考一些专业资料,下同

所有模型都是错误的,但有些是有用的—-George E.P.Box

三、数学、统计之美

数学与统计,演绎与归纳

在终极的分析中,一切知识都是历史;在抽象的意义下,一切科学都是数学;在理性的基础上,所有的判断都是统计学 ———— C.R.劳


MLE

假设\(y_n = f(x_n, \Theta) + \epsilon_n\)

并且假设白噪声\(\epsilon_n\)是独立同分布的(i.i.d),高斯正态分布,均值为0,方差为\(\sigma_n^2\),则似然是

\(p(D|M, \Theta)=\prod_{n=1}^{N}\frac{1}{\sqrt{2\pi\sigma_n^2}}e^{-\frac{(y_n - f(x_n, \Theta))^2}{2\sigma_n^2}}\)

最大上式似然\(p(D|M, \Theta)\),就等价于最小化负的log似然(negative log-likelihood):

\(min_{\Theta}(-log p(D|M, \Theta)) = min_{\Theta} \sum_{n-1}^{N}\frac{(y_n-f(x_n, \Theta))^2}{2\sigma_n^2} + \frac{1}{2}log(2\pi \sigma_n^2)\)

当方差\(\sigma_n^2\)是常数时,上式等价于MSE(minimum least squares error)函数:

\(min_{\Theta}\sum_{n-1}^{N}(y_n- f(x_n, \Theta))^2\)


MLE另一种阐述

就是最小化模型分布\(P_{model}(x, \Theta)\)和真实分布\(P_{data}(x)\)之间的KL散度(Kullback-Leibler Divergence)

\(D_{KL}(P_{data} || P_{model}) = E_{x~P_{data}} \begin{bmatrix} log\frac{P_{data}(x)}{p_{model}(x)} \end{bmatrix}=\)

\(E_{x~P_{data}}[log p_{data}(x) - log p_{model}(x)]=\)

\(E_{x~P_{data}}[log P_{data}(x)] - E_{x~P_{data}}[log p_{model}(x)]=\)

\(\approx -E_{x~P_{data}}[log p_{model}(x)]+...=-\sum_n log p_{model}(x_n, \Theta)\)

四、\(什么是“学习”^*\)

*主要摘自李航《统计学习》第二版

统计学习是关于计算机基于数据构建概率统计模型并运用模型对数据进行预测与分析的一门学科

如果一个系统能够通过执行某个过程改进它的性能,这就是学习————赫尔伯特.西蒙(Herbert A. Simon)

统计学习就是计算机系统通过运用数据及统计方法提高系统性能的机器学习,目前提及的机器学习大多是统计机器学习


联合概率分布

监督学习假设输入与输出的随机变量X和Y遵循联合概率分析P(X,Y)。 P(X,Y)表示分布(密度)函数,在学习过程中,假定这一联合概率分布存在,但对学习系统来说,联合概率分布的具体定义未知。

训练数据与测试数据被看作是依联合概率分布P(X,Y)独立同分布产生的。统计学习假设数据存在一定的统计规律,X和Y具有联合概率分布就是监督学习关于数据的基本假设。


假设空间

监督学习的目的在于学习一个由输入到输出的映射,这以映射由模型来表示。也就是所学习的目的就在于找到最好的这样的模型。模型属于由输入空间到输出空间的映射的集合,这个集合就是假设空间(hypothesis space),假设空间的确定意味着学习的范围的确定。

监督学习的模型可以是概率模型或非概率模型,由条件概率分布P(Y|X)或决策函数(decision function) Y = f(X)表示,随具体学习方法而定。对具体的输入进行相应的输出预测时,写作P(y|x)或y=f(x)


生成模型与判别模型

监督学习模型又可以分为生成模型(generative model)和判别模型(discriminative model)

两类模型的特点:

Juia编写李航《统计学习》第一版

https://github.com/Silk-Road/Statistical_Learning_Method_by-Dr.-Li-Hang

三、机器学习的主要分类

(1)监督学习

(2)无监督学习

(3)强化学习

动态规划(Dynamic Programming)

Q学习时序差分学习SARSA学习演员-评论家学习(actor-critic learning)平均奖励时序差分学习

神经网络

作为机器学习的一种方法可应用于监督学习和无监督学习,将深度学习与强化学习结合时形成深度强化学习

包括深度学习、自编码、多层感知机、CNN、RNN、LSTM、自组织特征映射网络(Self-organizing feature Map,SOM)、受限玻尔兹曼机等

Google Inception-V3

Google Inception-V3

\(四、机器学习通用工作流程^*\)

*摘自周志华教授西瓜书《机器学习》

1. 定义问题,收集数据

⚠️: 机器学习只能用来记忆训练数据中存在的模式,只能识别出曾经见过的东西。在过去的数据上训练机器学习来预测未来,这里存在一个假设,就是未来的规律与过去相同!

2. 选择衡量成功的指标

精度?准确率和召回率?客户保留率?…

衡量成功的指标将指引你选择损失函数,即模型要优化什么。它应该直接与你的目标保持一致

3. 确定评估方法

4. 准备数据

一旦知道训练什么,优化什么以及评估方法,那么就几乎准备好训练模型了,但将数据格式化,使其可以符合模型对数据格式的要求,比如对深度神经网络:

接下来就可以训练模型

5. 开发比基准更好的模型

这一阶段的目标是获得统计功效(statistical power),即开发一个小型模型,它能打败纯随机的基准(dumb baseline),比如MNIST数字分类时,精度大于0.1的模型,webshell检测时,精度大于0.5的模型

⚠️:不一定总是能获得统计功效,如果尝试了多种合理架构之后仍然无法打败随机基准,那么原因可能是那两个假设没有满足

如果一切顺利,还需选择三个关键参数来构建第一个工作模型

机器学习的核心之一:最优化算法

Gif摘自:https://blog.csdn.net/u012759136/article/details/52302426

6.扩大模型规模:开发过拟合的模型

模型是否足够强大?是否具有足够多的层和参数来对问题进行建模?

要搞清楚需要多大的模型,就必须开发一个过拟合的模型。

7. 模型正则化与调节超参数

不断的调整模型、训练、在验证数据上评估

8. 典型建模过程的框架

A schematic for the typical modeling process

A schematic for the typical modeling process

https://bookdown.org/max/FES/important-concepts.html

五、从很多同学认为“非常简单”的线性模型开始

90%以上的数据分析、预测不需要深度学习,而这90%的数据分析、预测用广义线性模型足以!

all classifiers have the same error rate when averaged over all possible datagenerating distributions. —- “No Free Lauch Theorem”(Wolpert, 1996):

根据没有免费午餐定理,没有一个机器学习分类算法在所有领域永远比其它任一模型好

1. 基本原理

以下英文资料摘自约翰霍普金斯大学《线性模型》,共102页!

从上面基本的线性模型公式及假设可得:

\(Y=\beta_0 + \beta_1X + \epsilon=f(x) + \epsilon\)

\(E[\epsilon]=0,E[\epsilon^2]=\sigma^2\)

=> \(E[y]=f(x), Var(y) = \sigma^2\)

=> \(E[(y - \hat{f}(x))^2] = E[y^2]+E[\hat f^2] - 2E[y\hat f]=\)

\(Var(y)+E[y]^2+Var(\hat f) + E[\hat f]^2-2E[y\hat f]\)

\(y=f(x)+\epsilon\)代入上式

\(E[(y - \hat{f}(x))^2] = Var(y) + Var(\hat y) + E[f-\hat f]^2 = noise + (bias)^2 + variance\)

2. 模型解释

3. 模型解释

4. 模型解释

library(UsingR)
library(magrittr)
data("diamond")
y <- diamond$price; x <- diamond$carat; n <- length(y)
newx = data.frame(x = seq(min(x), max(x), length = 100))
fit <- lm(y ~ x)
p1 = data.frame(predict(fit, newdata = newx, interval = ("confidence")))
p2 = data.frame(predict(fit, newdata = newx, interval = ("prediction")))
p1$interval = "confidence"; p2$interval = "prediction"
p1$x = newx$x; p2$x = newx$x
dat = rbind(p1, p2)
names(dat)[1] = "y"
ggplot(dat, aes(x = x, y = y)) +
  geom_ribbon(aes(ymin = lwr, ymax = upr, fill = interval), alpha = 0.2) +
  geom_line() +
  geom_point(data = data.frame(x = x, y = y), aes(x = x, y = y), size = 4)

5. 简单线性模型案例

data("mtcars")
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
boxplot(mpg ~ am, data = mtcars, main = "Boxplot of mpg vs am", cex.axis = 2)

fit1 <- lm(mpg ~ I(1 * (am == 0)), data = mtcars) 
summary(fit1)
## 
## Call:
## lm(formula = mpg ~ I(1 * (am == 0)), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        24.392      1.360  17.941  < 2e-16 ***
## I(1 * (am == 0))   -7.245      1.764  -4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

从返回结果可以看到,模型参数是0.05显著的,但是残差标准差(Residual standard error)在30个自由度上为4.902,太大了,另外,R平方只有0.3385

fit <- lm(mpg ~ ., data = mtcars)
step(fit, k = log(dim(mtcars)[1]), trace = 0)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt         qsec           am  
##       9.618       -3.917        1.226        2.936

通过step函数来寻找最好的线性模型,从返回来看,mpg~wt+qsec+am是最好的模型

fit2 <- lm(mpg ~ wt + qsec +am, data = mtcars) 
summary(fit2)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am           2.935837  1.4109045  2.080819 4.671551e-02
fit3 <- lm(mpg ~ wt + qsec + am + wt * am , data = mtcars) 
summary(fit3)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am + wt * am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5076 -1.3801 -0.5588  1.0630  4.3684 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.723      5.899   1.648 0.110893    
## wt            -2.937      0.666  -4.409 0.000149 ***
## qsec           1.017      0.252   4.035 0.000403 ***
## am            14.079      3.435   4.099 0.000341 ***
## wt:am         -4.141      1.197  -3.460 0.001809 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.084 on 27 degrees of freedom
## Multiple R-squared:  0.8959, Adjusted R-squared:  0.8804 
## F-statistic: 58.06 on 4 and 27 DF,  p-value: 7.168e-13

从模型汇总来看,所有的参数是0.05显著的,且残差标准差为2.084,只有开始fit1模型的一半不到,R平方则提升至0.8804,说明模型能解释88%的mpg因变量的变化。基于以上,可以说这是一个较好的模型

anova(fit, fit1,fit2,fit3)

通过dfbetas函数来衡量包含vs去除每个数据点时模型参数的差异

round(dfbetas(fit3), 3)
##                     (Intercept)     wt   qsec     am  wt:am
## Mazda RX4                -0.028  0.010  0.031  0.009 -0.034
## Mazda RX4 Wag            -0.008  0.003  0.009 -0.014  0.025
## Datsun 710                0.229 -0.079 -0.251 -0.133  0.007
## Hornet 4 Drive           -0.001 -0.091  0.068 -0.108  0.079
## Hornet Sportabout         0.203 -0.139 -0.162 -0.085  0.010
## Valiant                   0.152  0.023 -0.235  0.119 -0.110
## Duster 360               -0.186  0.089  0.177  0.033  0.024
## Merc 240D                -0.121 -0.243  0.339 -0.342  0.276
## Merc 230                  0.257  0.012 -0.354  0.133 -0.154
## Merc 280                  0.037 -0.049 -0.009 -0.049  0.023
## Merc 280C                -0.005  0.041 -0.028  0.055 -0.035
## Merc 450SE                0.019  0.031 -0.030  0.017 -0.030
## Merc 450SL                0.029 -0.011 -0.024 -0.011 -0.004
## Merc 450SLC              -0.032  0.003  0.018  0.025  0.006
## Cadillac Fleetwood        0.402 -0.677 -0.182 -0.396  0.301
## Lincoln Continental       0.316 -0.549 -0.131 -0.329  0.251
## Chrysler Imperial        -0.484  0.977  0.138  0.608 -0.486
## Fiat 128                 -0.489  0.169  0.536  0.268 -0.082
## Honda Civic              -0.022  0.008  0.024 -0.148  0.154
## Toyota Corolla           -0.264  0.091  0.289  0.377 -0.305
## Toyota Corona            -0.078  0.281 -0.072  0.252 -0.186
## Dodge Challenger         -0.118  0.071  0.100  0.040  0.002
## AMC Javelin              -0.199  0.149  0.149  0.100 -0.021
## Camaro Z28               -0.142  0.038  0.151 -0.001  0.041
## Pontiac Firebird          0.234 -0.035 -0.238 -0.016 -0.079
## Fiat X1-9                 0.074 -0.026 -0.082 -0.219  0.184
## Porsche 914-2             0.025 -0.009 -0.028  0.027 -0.026
## Lotus Europa              0.021 -0.007 -0.023  0.029 -0.034
## Ford Pantera L           -0.029  0.010  0.031  0.036 -0.048
## Ferrari Dino             -0.018  0.006  0.019  0.008 -0.014
## Maserati Bora             0.065 -0.022 -0.071 -0.398  0.531
## Volvo 142E                0.203 -0.070 -0.223  0.080 -0.185

hatvalues函数用来衡量每个点潜在的影响,从以上结果来看没有值得重点关注的点

data.frame(round(hatvalues(fit3), 3))
plot(predict(fit3), resid(fit3), main = "Residual plot")

从残差图可知,可知绝大多数的点是位于可接受范围的,只有很少残差相对模型拟合的值来说是超出可接受范围的。另外,我们还可以检查残差分布是否符合线性模型的假设

plot(density(resid(fit3)),xlab = "Residuals", ylab = "Density", main = "Residual Distribution")

从密度图可知,残差的分布是符合回归模型假设的

par(mfrow = c(2, 2)) 
plot(fit3)

5. 简单线性模型案例


基于贝叶斯框架的线性模型

以上的分析基于频率派,更关注p值

频率派侧重于去估计 “the real effect”. 比如,表示x与y相关性的那个“真值”. 因此,频率派模型会基于模糊的假设(数据是被随机采样的,通常是正态分布)返回一个“真正”相关性的“点估计” (point-estimate) (也就是说单一一个值)

而贝叶斯派完全没有这类假设,数据就是它本身(不需要假设它是从总体随机采样而来),基于这些观测数据和相关先验信念,贝叶斯采样算法(MCMC)会返回一个与观测数据相“兼容”的参数的概率分布(后验)。对于x和y的相关性来说,它将返回一个分布,比如说“最有可能的相关性参数是0.42,但从数据来看,0.12和0.74也有可能”

对于贝叶斯线性模型,不需要p值或其它频率派相关指标,我们只要简单的描述参数的后验分布即可,比如,返回其中间值(median)、90%可信区间(Credible Interval)等

相对频率派基于贝叶斯派思想建模的优点:http://hbiostat.org/talks/stratos19.pdf

library(bayestestR)
library(rstanarm)
library(ggplot2)
mc.cores = parallel::detectCores()
model <- stan_glm(mpg ~ wt + qsec + am , data = mtcars)
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.00099 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 9.9 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.118342 seconds (Warm-up)
## Chain 1:                0.121246 seconds (Sampling)
## Chain 1:                0.239588 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 3e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.122486 seconds (Warm-up)
## Chain 2:                0.118545 seconds (Sampling)
## Chain 2:                0.241031 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 2.3e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.23 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.167794 seconds (Warm-up)
## Chain 3:                0.129941 seconds (Sampling)
## Chain 3:                0.297735 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 2.9e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.114392 seconds (Warm-up)
## Chain 4:                0.121347 seconds (Sampling)
## Chain 4:                0.235739 seconds (Total)
## Chain 4:
describe_posterior(model)
posteriors <- insight::get_parameters(model)
ggplot(posteriors, aes(x = wt)) +
  geom_density(fill = "skyblue")

5. 简单线性模型案例


基于Google的深度学习框架TensorFlow建立线性模型
import os
import numpy as np
import math as m
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import tensorflow as tf

def reset_graph(seed=42):
    #使结果可复现
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)
    
n_points = 5000
n_features = 3

bias = np.ones(n_points).reshape((-1,1))
low = -np.ones((n_points, n_features), 'float')
high = np.ones((n_points, n_features), 'float')


X = np.random.uniform(low = low, high = high)
noise = np.random.normal(size=(n_points, 1))
weights = np.array([1.0, 0.5, 0.2, 0.1])
noise_std = 0.1
Y = weights[0]*bias + np.dot(X, weights[1:]).reshape((-1,1)) + noise_std * noise

# 将25%的数据用来测试
train_test_split = 4 

n_test = int(n_points/train_test_split)
n_train = n_points - n_test

X_train = X[:n_train, :]
Y_train = Y[:n_train].reshape((-1, 1))

X_test = X[n_train:, :]
Y_test = Y[n_train:].reshape((-1, 1))

X_np = np.hstack((np.ones(n_train).reshape((-1,1)), X_train))

X = tf.constant(X_np, dtype=tf.float32, name = "X")
y = tf.constant(Y_train, dtype=tf.float32, name = "y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

theta_value
## array([[1.002074  ],
##        [0.5009824 ],
##        [0.20381998],
##        [0.1024889 ]], dtype=float32)

六、没有包可调怎么办?

那只能自己编!https://github.com/Silk-Road/Statistical_Learning_Method_by-Dr.-Li-Hang/blob/master/HMM(EM).jl

(10.15):\[\alpha_i = \pi_ib_i(o_1), i=1,2,...,N\]

(10.16):\(对t=1,2,...,T-1\), \[\alpha_{i+1}(i)=\begin{bmatrix} \sum_{j=1}^N\alpha_t(j)a_{ji} \end{bmatrix} b_i(o_{t+1}), i=1,2,...,N\]

(10.17):\[P(O|\lambda)=\sum_{i=1}^N\alpha_T(i)\]

(10.19):\[\beta_T(i)=1, i=1,2,...,N\]

(10.20):\[对t=T-1,T-2,...,1, \beta_t(i)=\sum_{j=1}^N a_{ij}b_j(o_{t+1})\beta_{t+1}(j), i=1,2,...,N\]

(10.24):\[\gamma_t(i)=\frac{\alpha_t(i)\beta_t(i)}{\sum_{j=1}^N\alpha_t(j)\beta_t(j)}\]

(10.26):\[\xi_t(i,j)=\frac{\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}{\sum_{i=1}^N\sum_{j=1}^N \alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}\]

(10.39):\[a_{ij} = \frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)}\]

(10.40):\[b_j(k)=\frac{\sum_{t=1,o_t=v_k}^T \gamma_t(j)}{\sum_{t=1}^T \gamma_t(j)}\]

(10.41):\[\pi_i = \gamma_1(i)\]


Baum-Welch算法

输入:观测数据\(O=(o_1,o_2,...,o_T)\)

输出:隐马尔可夫模型参数

(1)初始化

对n=0,选取\(a_{ij}^{(0)}, b_j(k)^{(0)}, \pi_i^{(0)}\),得到模型\(\lambda^{(0)}=(A^{(0)}, B^{(0)}, \pi^{(0)})\)

(2)递推。对n=1,2,…,

\(a_{ij}^{(n+1)} = \frac{\sum_{t=1}^{T-1} \xi_t(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)}\)

\(b_j(k)=\frac{\sum_{t=1,o_t=v_k}^T \gamma_t(j)}{\sum_{t=1}^T \gamma_t(j)}\)

\(\pi_i = \gamma_1(i)\)

(3)终止。得到模型参数\(\lambda^{n+1}=(A^{(n+1)}, B^{(n+1)}, \pi^{(n+1)})\)


Viterbi算法

输入:模型\(\lambda=(A,B,\pi)\)和观测\(O=(o_1,o_2,...,o_T)\)

输出:最优路径\(I^*=(i_1^*, i_2^*,...,i_T^*)\)

(1)初始化

\(\delta_1(i) = \pi_ib_i(o_1), i=1,2,...,N\)

\(\psi_1(i)=0, i=1,2,...,N\)

(2)递推。对\(t=2,3,...,T\)

\(\delta_t(i) = max_{1\le j \le N}[\delta_{t-1}(j)a_{ji}]b_i(o_t), i=1,2,...,N\)

\(\psi_t(i) = argmax_{1 \le j \le N}[\delta_{t-1}(j)a_{ji}], i=1,2,...,N\)

(3)终止

\(P^*=max_{1 \le i \le N}\delta_T(i)\)

\(i_T^* = argmax_{1 \le i \le N}[\delta_T(i)]\)

# encoding = utf-8
# Author: Silk-Road
# Date: 2019-01-27
# Email: zhulincaowu@gmail.com
# Last modified by: Silk-Road
# Last modified time: 2019-01-29

# Julia Version 1.0.2
using Random, Distributions, LinearAlgebra

#-----------------------------------

mutable struct HMM
    N::Int # 状态数
    M::Int # 观测数 length(unique(O))
    A::Matrix{Float64} # 状态转移概率矩阵 size:[N, N]
    B::NamedTuple # 观测概率具名元组,每一个元素对应书中B的一列 size:[N, M]
    Pi::Vector{Float64} # 初始状态概率向量 length:[N]
    O::Union{Vector{String}, Nothing} # 观测序列 length:[T]
    T::Int
    Alpha::Matrix{Float64} # size:[T, N]
    Beta::Matrix{Float64} # size: [T, N]
    Pos::Float64 # 观测序列概率,即P(O|λ)
    function HMM(n,o,a,b,p_i)
        hmm = new()
        hmm.N = n
        hmm.O = o
        hmm.M = length(unique(o))
        hmm.A = a 
        hmm.B = b 
        hmm.Pi = p_i
        hmm.T = length(o)
        hmm.Alpha = zeros(Float64, length(o),n)
        hmm.Beta = zeros(Float64, length(o),n)
        hmm.Pos = 0.0 
        hmm
    end
end

#-------------------------------------------
#               §10.2 概率计算算法
#-------------------------------------------

function forward_probability(hmm::HMM)
    # 初值, 公式(10.15)
    ind_1 = hmm.O[1]
    hmm.Alpha[1,:] = hmm.Pi .* hmm.B[Meta.parse(ind_1)]
    # 递推,公式(10.16)
    @simd for t in 2:hmm.T
        ind = hmm.O[t] # 该观测实例将作为索引,用于定位B中的元组
        @simd for i in 1:hmm.N
            @inbounds hmm.Alpha[t, i] = dot(hmm.Alpha[t-1, :], hmm.A[i,:]) * hmm.B[Meta.parse(ind)][i]
        end
    end
    # 终止,公式(10.17)
    hmm.Pos = sum(hmm.Alpha[end,:])
    return hmm
end


function backward_probability(hmm::HMM)
    # 初值 公式(10.19)
    hmm.Beta[end,:] = ones(Float64, hmm.N)
    # 递推 公式(10.20)
    @simd for t in (hmm.T-1) : -1 : 1
        ind = hmm.O[t]
        @simd for i in 1:hmm.N
            @inbounds hmm.Beta[t, i] = [hmm.A[i, j] * hmm.B[Meta.parse(ind)][j] * hmm.Beta[t+1,j] for j in 1:hmm.N] |> sum
        end
    end
    hmm.Pos = [hmm.Pi[i] * hmm.B[Meta.parse(hmm.O[1])][i] * hmm.Beta[1,i] for i in 1:hmm.N] |> sum
    return hmm
end


# 10.2节书中例题
o = ["红", "白", "红"]
a = [0.5, 0.3, 0.2,
     0.2, 0.5, 0.3,
     0.3,0.2, 0.5]

a = reshape(a, (3,3))
b = (红 = [0.5,0.4,0.7], 白 = [0.5,0.6,0.3])
p_i = [0.2,0.4,0.4]
hmm = HMM(3, o, a, b, p_i)

forward_probability(hmm).Pos
backward_probability(hmm).Pos


#-------------------------------------------
#                §10.3 学习算法
#-------------------------------------------
# 公式(10.24)
# 使用多重分派计算gamma,返回整个gamma矩阵或单个gamma值
function gamma(hmm::HMM)
    gamma = zeros(Float64, hmm.T, hmm.N)
    alpha = forward_probability(hmm).Alpha
    beta = backward_probability(hmm).Beta
    @simd for t in 1:hmm.T
        @simd for i in 1:hmm.N
            @inbounds gamma[t,i] = alpha[t,i]*beta[t,i] / sum([alpha[t,j] * beta[t,j] for j in 1:hmm.N])
        end
    end
    return gamma
end

function gamma(hmm::HMM, t, i)
    alpha = forward_probability(hmm).Alpha
    beta = backward_probability(hmm).Beta
    numer = alpha[t, i] * beta[t, i]
    denom = [alpha[t, j] * beta[t, j] for j in 1:hmm.N] |> sum
    return numer/denom
end

# 公式(10.26)
function xi(hmm, t, i, j)
    alpha = forward_probability(hmm).Alpha
    beta = backward_probability(hmm).Beta
    numer = alpha[t, i] * hmm.A[i,j] * hmm.B[Meta.parse(hmm.O[t+1])][j] * beta[t+1, j]
    denom = 0.0
    @simd for i in 1:hmm.N
        @simd for j in 1:hmm.N
            @inbounds denom += alpha[t, i] * hmm.A[i,j] * hmm.B[Meta.parse(hmm.O[t+1])][j] * beta[t+1, j]
        end
    end
    return numer/denom
end

# EM算法初始化
function hmm_init(n,observations; rng = MersenneTwister(1234)) # e.g. observations = ["红", "白", "红"]
    a = rand(rng, 1:10, n, n) |> x-> x./sum(x,dims=2)
    uni_obser = unique(observations)
    m = length(uni_obser)
    b_tmp = rand(rng, 1:10, n, m) |> x-> x./sum(x,dims=2)
    t = tuple(Meta.parse.(unique(observations))...) # e.g. (:红, :白)
    b = NamedTuple{t, Tuple{Array{Float64,1},Array{Float64,1}}} # e.g. NamedTuple{(:红, :白),Tuple{Array{Float64,1},Array{Float64,1}}}
    b = b((b_tmp[:,1], b_tmp[:,2])) # 类型b的构造函数,产生具名数组. 注意根据类型b的定义,在构造时里面是元组,
                                    # 在初始化时这个代码还需要手工,目前还没有找到好的方法
    p_i = rand(rng, 1:10, n) |> x-> x./sum(x)
    hmm = HMM(n, observations, a, b, p_i)
    return hmm
end
# test:
# hmm = hmm_init(3,  ["红", "白", "红"])

# 使用EM算法估计参数
function baum_welch(hmm::HMM)
    a, b, p_i = hmm.A, hmm.B, hmm.Pi
    for i in 1:hmm.N
        for j in 1:hmm.N
            a[i,j] = [xi(hmm, t, i, j) for t in 1: (hmm.T-1)] |> sum # 公式(10.39)
        end
        p_i[i] = gamma(hmm,1,i) # 公式(10.41)
    end

    for j in 1:hmm.N
        for k in 1:hmm.M
            ind = [t for t in 1:hmm.T if hmm.O[t] == unique(hmm.O)[k]] # hmm.B[Meta.parse(hmm.O[t])]
            numer = [gamma(hmm, t, j) for t in ind] |> sum
            denom = [gamma(hmm, t, j) for t in 1:hmm.T] |> sum
            b[k][j] =  numer/denom # 公式(10.40)
        end
    end
    return a, b, p_i
end

# 递推
function iter(n, observations; maxstep = 100, ϵ = 0.001)
    hmm = hmm_init(n, observations)# e.g. hmm_init(3,  ["红", "白", "红"])
    old_a, old_b, old_p_i = hmm.A, hmm.B, hmm.Pi
    step = 0
    done = false
    while !done
            step +=1
            a, b, p_i = baum_welch(hmm)
            hmm = HMM(n,observations, a, b, p_i)
            done = (step > maxstep) || (abs.(a .- old_a) |> sum) < ϵ ||
            (abs.(b .- old_b) |> sum) < ϵ || (abs.(p_i .- old_p_i) |> sum) < ϵ
    end
    return hmm
end

#-------------------------------------------
#              §10.4 预测算法
#-------------------------------------------

function viterbi(hmm::HMM)
    # 初始化
    δ = zeros(Float64, hmm.T, hmm.N)
    ψ = zeros(Float64, hmm.T, hmm.N)
    ind_1 = hmm.O[1]
    I = zeros(Int, hmm.N)
    for i in 1:hmm.N
        δ[1,i] = hmm.Pi[i] * hmm.B[Meta.parse(ind_1)][i]
    end

    # 递推
    for t in 2:hmm.T
        ind = hmm.O[t]
        for i in 1:hmm.N
            delta_a = [δ[t-1, j] * hmm.A[j, i] for j in 1:hmm.N]
            tmp = findmax(delta_a)
            δ[t,i] = tmp[1] * hmm.B[Meta.parse(ind)][i]
            ψ[t,i] = tmp[2]
        end
    end

    # 最优路径回溯
    tmp = findmax(δ[end,:])
    P⃰ = tmp[1]
    I[end] = tmp[2]
    for i in (hmm.N-1):-1:1
        I[i] = ψ[i+1, i+1]
    end

    println("最优状态序列I为 $I")
    println("最优路径的概率P为 $P")
    println("δ为 $δ")
    println("ψ为 $ψ")
    return I
end

# 书中例10.3
o = ["红", "白", "红"]
a = [0.5, 0.3, 0.2,
     0.2, 0.5, 0.3,
     0.3,0.2, 0.5]

a = reshape(a, (3,3))
b = (红 = [0.5,0.4,0.7], 白 = [0.5,0.6,0.3])
p_i = [0.2,0.4,0.4]

hmm = HMM(3, o, a, b, p_i) # n = 3 是状态数
viterbi(hmm)

New era has coming!

我们是幸运的一代人,跟随文明进步、技术大发展的步伐,感受世界的心跳💗

一个简单的深度学习实例

#下载数据集
!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip \
    -O /tmp/horse-or-human.zip

import os
import zipfile

local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()

train_horse_dir = os.path.join('/tmp/horse-or-human/horses') #含500张马的图片
train_human_dir = os.path.join('/tmp/horse-or-human/humans') #含527张人的图片

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import tensorflow as tf
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.optimizers import RMSprop
#根据图片所在文件夹自动给图片打标签
from tensorflow.keras.preprocessing.image import ImageDataGenerator 

model = tf.keras.models.Sequential([
    # 输入数据的维度为300x300x3,3是表示RGB的通道(channnel)
    # 第一层卷积,激活函数是relu:max(0,x)
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
    # 池化(现在一些国内学者更倾向于翻译为“汇合”)
    tf.keras.layers.MaxPooling2D(2, 2),
    # 第二层卷积
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # 第三层卷积
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # 第四层卷积
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # 第五层卷积
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten结果,全连接层
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    # 只有一个神经元单元,其值为0代表为马,1代表是人,因为是二值分类问题,所以最后一层激活函数是sigmoid
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary()

# 使用二值交叉熵作为损失函数,RMSprop算法进行参数迭代优化,学习率0.001,精确度(accuray)作为模型训练的监控指标。RMSprop算法是SGD算法的指数平滑升级优化,其学习率会随着迭代逐步进行优化,Adam和Adagrad算法也能达到同样的效果
model.compile(loss='binary_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])
            

# 所有图像进行预处理,利用正则化将像素点的值从[0,255]转换到[0,1]之间,并将图片转换为float32的张量,如果不是float32,后面的模型训练会慢得多,另外根据所在文件夹名称给图片打标签
train_datagen = ImageDataGenerator(rescale=1/255)

# 设定批处理的大小为128
train_generator = train_datagen.flow_from_directory(
        '/tmp/horse-or-human/',  # 训练集的目录
        target_size=(300, 300),  
        batch_size=128,
        # 因为使用了二值交叉熵损失函数,这里class_mode也设为binary
        class_mode='binary')

history = model.fit_generator(
      train_generator,
      steps_per_epoch=8,  
      epochs=15,
      verbose=1)

# 随机选择一张图片作为输入,然后输出其每一层神经网络的"表示"(representations)
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)

horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)

img = load_img(img_path, target_size=(300, 300))  
x = img_to_array(img)  
x = x.reshape((1,) + x.shape)  

x /= 255

successive_feature_maps = visualization_model.predict(x)

# 网络每一层的名称,后面可视化时用到
layer_names = [layer.name for layer in model.layers]

# 可视化卷积神经网络对原始图片特征的学习“表示”
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
  if len(feature_map.shape) == 4:
    # 只对卷积层和池化层的表示进行可视化,排除全连接层
    n_features = feature_map.shape[-1]  # 特征数
    size = feature_map.shape[1]
    display_grid = np.zeros((size, size * n_features))
    for i in range(n_features):
      x = feature_map[0, :, :, i]
      x -= x.mean()
      x /= x.std()
      x *= 64
      x += 128
      x = np.clip(x, 0, 255).astype('uint8')
      display_grid[:, i * size : (i + 1) * size] = x
    scale = 20. / n_features
    plt.figure(figsize=(scale * n_features, scale))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')

output-shape:每层特征图的维度,因为卷积时的padding为1,所以每次维度-2,另外池化单元是2x2,所以每次池化后维度减半。

New era has coming!

一个逆强化学习实例

根据论文设定条件:

我们可以推得下列公式,并根据极大似然估计基于TensorFlow计算模型参数:

其它相关推导及代码不再详述,最后的结果

七、基于数据科学分析、预测的基本流程

约翰霍普金斯大学 | 数据科学专题


1. 获得、清洗数据

2. Exploratory Data Analysis

3. 统计推断

样本均值与理论均值

set.seed(123)
lambda <- 0.2
m = NULL
for (i in 1 : 1000) m = rbind(m, rexp(40, lambda)) # 模拟1000次均值为5的指数分布
library(ggplot2)
mx <- apply(m, 1, mean)
cat("样本均值为: ",mean(mx))
## 样本均值为:  5.011911
cat("理论均值为: ",1/lambda)
## 理论均值为:  5

正态分布检验

library(ggplot2)
ggplot(data.frame(x = mx), aes(x = x)) + 
     geom_histogram(binwidth=.2, colour = "black", fill = "lightblue") +
     geom_vline(xintercept = mean(mx), col = "red", size = 2) + 
     ggtitle("模拟1000次指数分布均值的直方图") + 
     geom_text(aes(x = 6, y =100, label = "样本均值 = 5.013614"),check_overlap = TRUE, col = "purple", size = 5)

qqnorm(mx)
qqline(mx)

从QQ图可以看出,它接近但严格上不是正态分布,为此,我们可以使用Shapiro-Wilk正态检验,从返回值来看其为0.004601,小于0.05,因为我们可以在绝大部分情况下拒绝其为正态分布的假设。

shapiro.test(mx)
## 
##  Shapiro-Wilk normality test
## 
## data:  mx
## W = 0.99555, p-value = 0.005322

一个简单的推断实例

这个数据是维他命C对豚鼠牙齿生长影响的数据

data("ToothGrowth")
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

进一步获得一些信息

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
with(ToothGrowth, table(supp,dose))
##     dose
## supp 0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

接着来看维他命C是否比橙汁是否有更显著的影响

VC <- subset(ToothGrowth, supp == "VC") 
OJ <- subset(ToothGrowth, supp == "OJ") 

假设

Null hypothesis:维他命C和橙汁对牙齿生长有相同的效果

Alternative hypothesis:效果不同

检验

由于有3种级别的计量(0.5,1,2),我们需要分别进行分析

pvalue_0.5 <- t.test(VC[VC$dose == 0.5,]$len, OJ[OJ$dose == 0.5,]$len, alternative = "two.sided", paired = TRUE)$p.value 
pvalue_1 <- t.test(VC[VC$dose == 1.0,]$len, OJ[OJ$dose == 1.0,]$len, alternative = "two.sided", paired = TRUE)$p.value
pvalue_2 <- t.test(VC[VC$dose == 2.0,]$len, OJ[OJ$dose == 2.0,]$len, alternative = "two.sided", paired = TRUE)$p.value
cat("0.5剂量的p值为 : ", pvalue_0.5, "\n")
## 0.5剂量的p值为 :  0.01547205
cat("1剂量的p值为 : ", pvalue_1, "\n")
## 1剂量的p值为 :  0.008229248
cat("2剂量的p值为 : ", pvalue_2, "\n")
## 2剂量的p值为 :  0.9669567

结论

当剂量是0.5时,其p值为0.01547205,因而我们可以拒绝原假设,维他命C和橙汁有不同的效果。注意,具体哪个好光从这个检验里得不到

当剂量是1时,其p值为0.008229248,同样我们可以拒绝原假设,维他命C和橙汁有不同的效果。

而当剂量是2时,其p值为0.9669567,远大于0.05,我们无法拒绝原假设,维他命C和橙汁对牙齿的生长没有区别。


4. 机器学习建模

前面已有简介,这里不再赘述


5. 数据产品

https://liujing.shinyapps.io/Capstone/

https://liujing.neocities.org/

AND THIS SLIDES😄:这个slides除了TensorFlow部分的结果是链接的,因为计算时间较长,其它含代码的计算结果和相关图表都由Rstudio通过RMarkdown文件计算生成!是的,这个IDE通过调用相关包,可以运行三种科学计算语言(R,Python和Julia),Awesome!!!

返回主页