python 向量空间模型 相似度计算 求大神 运行总是通不过

问题描述

python 向量空间模型 相似度计算 求大神 运行总是通不过
  #用向量空间模型计算两个字符串s和s1之间的相似度

from math import sqrt
from collections import Counter
import re

def vsm_distance(s,s1):

      #将s,s1转化为字典格式(dictionary{词:词频})
mylist=re.findall(r"w+",s)
ss=Counter( mylist)
mylist1=re.findall(r"w+",s1)
ss1=Counter( mylist1)
    #向量空间计算
c = set(ss.keys())&set(ss1.keys())
if not c:
    return 0
x = sum([ss.get(i)*ss1.get(i) for i in c])
sq1 = sqrt(sum([pow(ss.get(i),2) for i in ss.values()]))
sq2 = sqrt(sum([pow(ss1.get(i),2) for i in ss1.values()]))
p = float(x)/(sq1*sq2)
return p

s="KBA is to give a chance to non-popular entities information to be updated as soon as a useful information is published on the internet. The KBA organizershave built up a stream-corpus which is a huge corpus of timestamped web documents that can be processed chronologically. Hence it is possible to simulate a real time system. The documents come from newswires, blogs, forums, review, memetracker….. In addition, a set of target entities, coming from wikipedia or from twitter, has been selected for their ambiguity or unpopularity. And last but not least, more than 60000 documents have been annotated so that systems can train on it. The train period starts on documents published from october 2011 until februray, and the test period starts from februray 2012 to februray 2013."

s1="The KBA track is divided in two tasks:CCR(Cumulative Citation Recommendation) and SSF(Streaming Slot Filling). CCR task is to filter out documents worth citing in a profile of an entity(e.g., wikipedia or freebase article). SSF task is to detect changes on given slots for each of the target entities. This article is focused only on CCR task."

vsm_distance(s,s1)

解决方案

运行通不过是有什么语法错误还是结果不正确?

解决方案二:

Traceback (most recent call last):
File "", line 1, in
File "D:Pythonlibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "D:Pythonlibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "E:/我的文档/Python Scripts/filefour.py", line 23, in
File "E:/我的文档/Python Scripts/filefour.py", line 17, in vsm_distance
TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

解决方案三:

Traceback (most recent call last):
File "", line 1, in
File "D:Pythonlibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "D:Pythonlibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "E:/我的文档/Python Scripts/filefour.py", line 23, in
File "E:/我的文档/Python Scripts/filefour.py", line 17, in vsm_distance
TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

解决方案四:

Traceback (most recent call last):
File "", line 1, in
File "D:Pythonlibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "D:Pythonlibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "E:/我的文档/Python Scripts/filefour.py", line 23, in
File "E:/我的文档/Python Scripts/filefour.py", line 17, in vsm_distance
TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

解决方案五:

看上去你传进来的s,s1等数据有问题,导致后面处理出错了,你现在函数中一进来打印一下看看

解决方案六:

向量空间模型文档相似度计算实现(C#)

时间: 2024-10-02 21:12:44

python 向量空间模型 相似度计算 求大神 运行总是通不过的相关文章

python-关于Python的字符串?小白lady求大神指点!!!!!!

问题描述 关于Python的字符串?小白lady求大神指点!!!!!! 我看到这个以为函数是这样用的, str1='I love my country' print(str1.replace('country','countrY') 于是写了a='wo de dian nao' print(a.len(a)) 但这样可以: a='wo de dian nao' print(len('wo de dian nao')) .....But why? 解决方案 str1= str1.replace('

关于python安装pymssql报错,求大神指点,在ubuntu14.04下

问题描述 关于python安装pymssql报错,求大神指点,在ubuntu14.04下 报错内容: (pyenvdata)lin@lin-ThinkPad:~$ pip install pymssqlDownloading/unpacking pymssql Downloading pymssql-2.1.1.tar.gz (2.4MB): 2.4MB downloaded Running setup.py (path:/home/lin/pyenvdata/build/pymssql/set

本人新手求大神帮忙看看这个PYTHON的机房管理系统怎么写

问题描述 本人新手求大神帮忙看看这个PYTHON的机房管理系统怎么写 (1)输入功能:输入30名学生的学号.班级.姓名.上机起始时间.(2)计算功能:计算每个下机学生的上机费用,每小时1元.(上机费用=上机时间* 1.0/h ,不足一小时按一小时计算)(3)查询功能:按条件(班级.学号.姓名)显示学生的上机时间.(4)机器使用情况的显示(显示方式不限但要一目了然)楸大神帮忙 解决方案 作业布置好了,就赶快做吧.做完作业你会成长成高级软件技工的.

python IDLE 打不开 环境变量都配置对了,求大神帮助!谢谢!!!

问题描述 python IDLE 打不开 环境变量都配置对了,求大神帮助!谢谢!!! Microsoft Windows XP 版本 5.1.2600 版权所有 1985-2001 Microsoft Corp. C:Documents and SettingsAdministrator>C:Python27Libidlelibidle.py Traceback (most recent call last): File "C:Python27Libidlelibidle.py"

pascal 编程-pascal 问题 计算式子的值 求大神指点

问题描述 pascal 问题 计算式子的值 求大神指点 描述 Description 给定一个表达式串,计算其最后结果 输入格式 Input Format 一个表达式串(只包函+-*/()等运算符,且是整除;表达式长度小于255个字符) 输出格式 Output Format 最后结果(一个整数) 样例输入 1+2*3+(1+2) 样例输出 10 program p1654; var tot:ansistring; function f(st:ansistring):longint; var i,

求大神啊,用js计算啊

问题描述 求大神啊,用js计算啊 在html里面,首先用c foreach生成多个id相同的div. 然后在每个div里面,又会通过c foreach生成多个id相同的隐藏的iuput. 我想计算每个div下面这些input的value的和,同时还要把这个和放回到他所属的div里面并显示出来,js刚刚入门,不会啊... 解决方案 id为什么叫做id你还生成多个id一样的 解决方案二: 我建议你生成name一样,或者是自定义属性,然后用jQuery选择器,还有each操作,把他们的值想加 解决方案

3d 三维模型-求大神,有懂关于"三维网格模型特征点提取算法的研究与实现"的么

问题描述 求大神,有懂关于"三维网格模型特征点提取算法的研究与实现"的么 A Novel Feature Points Selection Algorithm for 3D Triangular Mesh Models 是什么意思,能具体解释下么 Protrusion-oriented 3D mesh segmentation这个有是什么

求大神帮助解决:未在本地计算机上注册“Microsoft.Jet.OLEDB.4.0”提供程序

问题描述 有关调用实时(JIT)调试而不是此对话框的详细信息,请参见此消息的结尾.**************异常文本**************System.InvalidOperationException:未在本地计算机上注册"Microsoft.Jet.OLEDB.4.0"提供程序.在System.Data.OleDb.OleDbServicesWrapper.GetDataSource(OleDbConnectionStringconstr,DataSourceWrapper

python模块安装问题-python模块的安装问题求大神指点

问题描述 python模块的安装问题求大神指点 如何在wins7-64位系统下安装rpy2模块,怎么设置r的环境,求大神指点,谢谢了 解决方案 Python安装PyGraphics包 (使用media模块)问题 解决方案二: http://blog.sina.com.cn/s/blog_6caea8bf0100vo9l.html