Python与Julia的简单测试

源于前几天帮组内同学落地一模型,之前Python代码比较慢,结合业务无法落地,所以将核心脚本全部转为Julia,当时在Ubuntu上只启动8核性能提升约10倍。这里仅仅使用计算余弦相似度这个简单的例子来对比,其实还有其它的并行方案,特别采用GPU对于这类矩阵计算提升是非常大的。需要说明的是,这里是通过一台2015老款MBP测试,貌似环境还是有些问题,因为使用Julia多线程时基本没起作用,但前几天在Ubuntu上没出现这个问题,具体原因未知,所以以下时间没有太大意义,主要基于测试代码方便感兴趣的同学可以使用自己的电脑进行测试。

Python

In [1]:
from numba import jit
import numpy as np
import pandas as pd

def cos_sim(cluster, document):
    denom = np.linalg.norm(cluster)*np.linalg.norm(document)
    return np.dot(cluster, document)/denom

@jit(nopython=True)
def cos_sim_numba(cluster, document):
    denom = np.linalg.norm(cluster)*np.linalg.norm(document)
    return np.dot(cluster, document) / denom
In [2]:
data = np.random.randn(1000000,100)
In [3]:
%timeit [cos_sim_numba(data[0],data[i]) for i in range(1, 1000000)]
2.29 s ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]:
%timeit [cos_sim(data[0],data[i]) for i in range(1, 1000000)]
13.8 s ± 549 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Julia

In [1]:
using Base.Threads
using Distributed
using Random
addprocs(4)
@everywhere using LinearAlgebra
In [2]:
data = randn(1000000, 500);
In [3]:
@everywhere function cos_sim_julia(x::Vector{Float64}, y::Vector{Float64})
    return dot(x, y) / (norm(x) * norm(y))
end
In [4]:
@time Threads.@threads for i in 2:size(data,1)
    cos_sim_julia(data[1,:], data[i,:])
end
 14.274663 seconds (7.42 M allocations: 7.844 GiB, 19.08% gc time)
In [5]:
@time dot(data[1,:], data[2,:])
  0.005813 seconds (110 allocations: 14.844 KiB)
Out[5]:
-20.718629583347003
In [6]:
function _dot(x,y)
    sum(((xi,yi),)-> xi*yi ,zip(x,y))
end
Out[6]:
_dot (generic function with 1 method)
In [7]:
@time _dot(data[1,:], data[2,:])
  0.057280 seconds (150.44 k allocations: 8.163 MiB)
Out[7]:
-20.718629583347013
In [8]:
function _norm(x::Vector{Float64})
    return _dot(x,x) |> sqrt
end
Out[8]:
_norm (generic function with 1 method)
In [9]:
@time norm(data[1,:])
  0.006821 seconds (22 allocations: 5.531 KiB)
Out[9]:
22.07034262537925
In [10]:
@time _norm(data[1,:])
  0.000042 seconds (2 allocations: 4.078 KiB)
Out[10]:
22.070342625379254
In [13]:
@everywhere function cos_sim_julia2(x::Vector{Float64}, y::Vector{Float64})
    return _dot(x, y) / (_norm(x) * _norm(y))
end
In [14]:
@time Threads.@threads for i in 2:size(data,1)
    cos_sim_julia2(data[1,:], data[i,:])
end
  9.121629 seconds (7.02 M allocations: 7.824 GiB, 17.63% gc time)
In [ ]: