当前位置:网站首页>Pyspark机器学习:向量及其常用操作
Pyspark机器学习:向量及其常用操作
2022-08-01 04:11:00 【Sun_Sherry】
Spark版本:V3.2.1
本篇主要介绍pyspark.ml.linalg中的向量操作。
1. DenseVector(稠密向量)
1.1 创建
稠密向量和一般的数组差不多,其创建方法如下:
from pyspark.ml import linalg
import numpy as np
dvect1=linalg.Vectors.dense([1,2,3,4,5])
dvect2=linalg.Vectors.dense(1.2,3,3,4,5)
print(dvect1)
print(dvect2)
其结果如下(注意其数据类型为float型):
1.2 常用操作
- 对两个长度相同的向量可以进行加减乘除操作。具体如下:
res1=dvect1+dvect2
res2=dvect1-dvect2
res3=dvect1*dvect2
res4=dvect1/dvect2
print(res1)
print(res2)
print(res3)
print(res4)
其结果如下:
- 可以使用numpy.darray中的一些属性
dvec1_shape=dvect1.array.shape
dvec1_size=dvect1.array.size
print(dvec1_shape)# 其结果为:(5,)
print(dvec1_size)# 其结果为:5
- dot点乘操作
res_1=dvect1.dot([1,2,3,4,5])
res_2=dvect1.dot([0,1,0,0,0])
res_3=dvect1.dot(dvect2)
print(res_1) #结果为55
print(res_2) #结果为2
print(res_3) #结果为57.2
- 求向量的范式
dvect1=linalg.Vectors.dense([1,2,3,4,5])
norm_0=dvect1.norm(0)
norm_1=dvect1.norm(1)
norm_2=dvect2.norm(2)
print('dvect1的L0范式为:{}'.format(norm_0))
print('dvect1的L1范式为:{}'.format(norm_1))
print('dvect1的L2范式为:{:.3f}'.format(norm_2))
其结果如下:
- numNonZeros()统计非0元素的个数
dvect1=linalg.Vectors.dense([1,0,3,0,5])
num_nonzero=dvect1.numNonzeros()
print(num_nonzero)#其结果为3
- squared_distance()求两个维度相同的向量的平方距离
dvect1=linalg.Vectors.dense([1,0,3])
dvect2=linalg.Vectors.dense([1,1,1])
dist=dvect1.squared_distance(dvect2) #其值为5
- 取出向量的值
dvect1=linalg.Vectors.dense([1,0,3])
print(dvect1.toArray())
print(dvect1.values)
2. SparseVector(稀疏向量)
2.1 创建
稀疏向量的创建主要有以下几种方式:
- Vectors.sparse(向量长度, 索引数组,与索引数组所对应的数值数组),其中索引从0开始编号,下同;
- Vectors.sparse(向量长度, {索引:数值,索引:数值, … \dots …})
- Vectors.sparse(向量长度,[(索引,数值),(索引,数值), … \dots …])
举例如下:
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
2.2 常用操作
稀疏变量中一些操作与稠密向量的操作一致,不再赘述。这里只介绍以下两个操作:
- toArray显示稀疏变量的所有数值
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.toArray())
print(svect2.toArray())
print(svect3.toArray())
其结果如下:
- indices()返回稀疏向量中非0元素的索引值
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.indices) #返回[0 1](array类型,下同)
print(svect2.indices) #返回[0 2]
print(svect3.indices) #返回[2 3]
边栏推荐
- 怀念故乡的月亮
- How to write a high-quality digital good article recommendation
- ICML2022 | Deep Dive into Permutation-Sensitive Graph Neural Networks
- Nmap manuals - the full version
- Introduction to Oracle
- 【堆】小红的数组
- 让你的 Lottie 支持文字区域内自动换行
- 2022-07-31: Given a graph with n points and m directed edges, you can use magic to turn directed edges into undirected edges, such as directed edges from A to B, with a weight of 7.After casting the m
- 【无标题】
- Which interpolation is better for opencv to zoom in and out??
猜你喜欢
Mysql基础篇(Mysql数据类型)
【Make YOLO Great Again】YOLOv1-v7全系列大解析(Neck篇)
[Message Notification] How about using the official account template message?
Visual Studio提供的 Command Prompt 到底有啥用
How to promote new products online?
Mysql基础篇(约束)
Software Testing Weekly (Issue 82): In fact, all those who are entangled in making choices already have the answer in their hearts, and consultation is just to get the choice that they prefer.
leetcode6133. 分组的最大数量(中等)
Elastic Stack的介绍
A way to deal with infinite debugger
随机推荐
Elastic Stack的介绍
TIM登陆时提示00001(TIM00001)
Game Theory (Depu) and Sun Tzu's Art of War (42/100)
/etc/fstab
Dynamic Programming 01 Backpack
New York University et al | TM-Vec: Template Modeling Vectors for Rapid Homology Detection and Alignment
The 16th day of the special assault version of the sword offer
This map drawing tool is amazing, I recommend it~~
软件测试基础理论知识—用例篇
leetcode6132. Make all elements in an array equal to zero (simple, weekly)
【云原生之kubernetes实战】kubernetes集群的检测工具——popeye
【愚公系列】2022年07月 Go教学课程 023-Go容器之列表
2022-07-31: Given a graph with n points and m directed edges, you can use magic to turn directed edges into undirected edges, such as directed edges from A to B, with a weight of 7.After casting the m
JS new fun(); class and instance JS is based on object language Can only act as a class by writing constructors
一个service层需要调用另两个service层获取数据,并组装成最后的数据,数据都是list,缓存如何设计?
The Principle Of Percona Toolkit Nibble Algorithm
《少年派2》:新男友竟脚踩两只船,林妙妙与钱三一感情回温
Passive anti-islanding-UVP/OVP and UFP/OFP passive anti-islanding model simulation based on simulink
How to write a high-quality digital good article recommendation
在互联网时代,有诸多「互联网+」模式的诞生