SAS vs. R (vs. Python) – which tool should I learn?

原文  : 
http://www.analyticsvidhya.com/blog/2014/03/sas-vs-vs-python-tool-learn/

We love comparisons!

From Samsung vs. Apple vs. HTC in smartphones; iOS vs. Android vs. Windows in mobile OS to comparing candidates for upcoming elections or selecting captain for the world cup team, comparisons and discussions enrich us in our life. If you love discussions, all you need to do is pop up a relevant question in middle of a passionate community and then watch it explode! The beauty of the process is that everyone in the room walks away as a more knowledgeable person.

I am sparking something similar here. SAS vs. R has probably been the biggest debate analytics industry might have witnessed. Python is another worthy candidate to put in the mix now. The reason for me to start this discussion is not to watch it explode (that would be fun as well though). I know that we all will benefit from the discussion.

This has also been one of the most commonly asked question to me on this blog. So, I thought I’ll discuss it with all my readers and visitors!

Hasn’t a lot already been said on this topic?

Probably yes! But I still feel the need for discussion for following reasons:

  • The industry is very dynamic. Any comparison which was done 2 years back might not be relevant any more.
  • Traditionally Python has been left out of the comparison. I think it is a worthy consideration now.
  • While I’ll discuss global trends about the languages, I’ll add specific information with regards to Indian analytics industry (which is at a different level of evolution)

So, without any further delay, let the combat begin!

Background:

Here is a brief description about the 3 ecosystems:

  • SAS: SAS has been the undisputed market leader in commercial analytics space. The software offers huge array of statistical functions, has good GUI (Enterprise Guide & Miner) for people to learn quickly and provides awesome technical support. However, it ends up being the most expensive option and is not always enriched with latest statistical functions.
  • R: R is the Open source counterpart of SAS, which has traditionally been used in academics and research. Because of its open source nature, latest techniques get released quickly. There is a lot of documentation available over the internet and it is a very cost-effective option.
  • Python: With origination as an open source scripting language, Python usage has grown over time. Today, it sports libraries (numpy, scipy and matplotlib) and functions for almost any statistical operation / model building you may want to do. Since introduction of pandas, it has become very strong in operations on structured data.

Attributes for comparison:

I’ll compare these languages on following attributes:

  1. Availability / Cost
  2. Ease of learning
  3. Data handling capabilities
  4. Graphical capabilities
  5. Advancements in tool
  6. Job scenario
  7. Customer service support and Community

I am comparing these from point of view of an analyst. So, if you are looking for purchasing a tool for your company, you may not get complete answer here. The information below will still be useful. For each attribute I give a score to each of these 3 languages (1 – Low; 5 – High).

The weightage for these parameters will vary depending on what point of career you are in and your ambitions.

1. Availability / Cost:

SAS is a commercial software. It is expensive and still beyond reach for most of the professionals (in individual capacity). However, it holds the highest market share in Private Organizations. So, until and unless you are in an Organization which has invested in SAS, it might be difficult to access one.

R & Python, on the other hand are free and can be downloaded by any one. Here are my scores on this parameter:

SAS – 2

R – 5

Python – 5

2. Ease of learning:

SAS is easy to learn and provides easy option (PROC SQL) for people who already know SQL. Even otherwise, it has a good stable GUI interface in its repository. In terms of resources, there are tutorials available on websites of various university and SAS has a comprehensive documentation. There are certifications from SAS training institutes, but they again come at a cost.

R has the steepest learning curve among the 3 languages listed here. It requires you to learn and understand coding. R is a low level programming language and hence simple procedures can take longer codes.

Python is known for its simplicity in programming world. This remains true for data analysis as well. While there are no widespread GUI interfaces as of now, I am hoping Python notebooks will become more and more mainstream. They provide awesome features for documentation and sharing.

SAS – 4.5

R – 2.5

Python – 3.5

3. Data handling capabilities:

This used to be an advantage for SAS till some time back. R computes every thing in memory (RAM) and hence the computations were limited by the amount of RAM on 32 bit machines. This is no longer the case. All three languages have good data handling capabilities and options for parallel computations. This I feel is no longer a big differentiation. Also, I might not be aware of the latest innovation in each ecosystem and hence I see all 3 as equally capable.

SAS – 4

R – 4

Python – 4

4. Graphical capabilities:

SAS has decent functional graphical capabilities. However, it is just functional. Any customization on plots are difficult and requires you to understand intricacies of SAS Graph package.

R has the most advanced graphical capabilities among the three. There are numerous packages which provide you advanced graphical capabilities.

Python capabilities will lie somewhere in between, with options to use native libraries (matplotlib) or derived libraries (allowing calling R functions).

SAS – 3

R – 4.5

Python – 4

5. Advancements in tool:

All 3 ecosystems have all the basic and most needed functions available. This feature only matters if you are working on latest technologies and algorithms.

Due to their open nature, R & Python get latest features quickly (R more so compared to Python). SAS, on the other hand updates its capabilities in new version roll-outs. Since R has been used widely in academics in past, development of new techniques is fast.

Having said this, SAS releases updates in controlled environment, hence they are well tested. R & Python on the other hand, have open contribution and there are chances of errors in latest developments.

SAS – 4

R – 4.5

Python – 4

6. Job scenario:

Globally, SAS is still the market leader in available corporate jobs. Most of the big organizations still work on SAS. R / Python, on the other hand are better options for start-ups and companies looking for cost efficiency. Also, number of jobs on R / Python have been reported to increase over last few years. Here is a trend widely published on internet, which shows the trend for R and SAS jobs. Python jobs for data analysis will have similar trend as R jobs:



Source: r4stats.com

In India, specifically, the gap in SAS vs. R is bigger. My estimate is that SAS would have about 70% of market share, R around 15% and Python less than 5%. However, the trends are similar to global trends.

SAS – 4.5

R – 3.5

Python – 2.5

7. Customer service support & community:

R has the biggest online community but no customer service support. So if have trouble, you are on your own. You will get a lot of help though. Similar for python, though at a lower scale.

SAS on the other hand has dedicated customer service along with the community. So, if you have problems in installation or any other technical challenges, you can reach out to them.

SAS – 4

R – 3.5

Python – 3

Other factors:

Following are some more points worthy to note:

  • Python is used widely in web development. So if you are in an online business, using Python for web development and analytics can provide synergies
  • SAS used to have a big advantage of deploying end to end infrastructure (Visual Analytics, Data warehouse, Data quality, reporting and analytics), which has been mitigated by integration / support of R on platforms like SAP HANA and Tableau. It is still, far away from seamless integration like SAS, but the journey has started.

Conclusion:

Clearly, there is no winner in this race yet. It will be pre-mature to place bets on what will prevail, given the dynamic nature of industry. Depending on your circumstances (career stage, financials etc.) you can add your own weights and come up with what might be suitable for you. Here are a few specific scenarios:

  • If you are a fresher entering in analytics industry (specifically so in India), I would recommend to learn SAS as your first language. It is easy to learn and holds highest job market share.
  • If you are some one who has already spent time in industry, you should try and diversify your expertise be learning a new tool.
  • For experts and pros in industry, people should know at least 2 of these. That would add a lot of flexibility for future and open up new opportunities.
  • If you are in a start-up / freelancing, R / Python is more useful

Here is the final scorecard:

These are my views on this comparison. Now, its your turn to share your views through the comments below.

时间: 2024-11-08 19:01:55

SAS vs. R (vs. Python) – which tool should I learn?的相关文章

R、Python、Scala 和 Java,到底该使用哪一种大数据编程语言?

有一个大数据项目,你知道问题领域(problem domain),也知道使用什么基础设施,甚至可能已决定使用哪种框架来处理所有这些数据,但是有一个决定迟迟未能做出:我该选择哪种语言?(或者可能更有针对性的问题是,我该迫使我的所有开发人员和数据科学家非要用哪种语言?)这个问题不会推迟太久,迟早要定夺. 当然,没有什么阻止得了你使用其他机制(比如XSLT转换)来处理大数据工作.但通常来说,如今大数据方面有三种语言可以选择:R.Python和Scala,外加一直以来屹立于企业界的Java.那么,你该选

数据科学界华山论剑:R与Python巅峰对决

如果你是数据分析领域的新兵,那么你一定很难抉择--在进行数据分析时,到底应该使用哪个语言,R还是Python?在网络上,也经常出现诸如"我想学习机器语言,我应该用哪个编程语言"或者"我想快速解决问题,我应该用R还是Python"等这类问题.尽管两个编程语言目前都是数据分析社区的佼佼者,但是它们仍在为成为数据科学家的首选编程语言而战斗.今天,就让我们从数据科学的角度,一步步比较这两大编程语言. #1 对阵双方介绍 Ladies and Gentlemen,让我们隆重的

R和Python中的文本挖掘:8个入门小贴士

你希望学习文本挖掘,却发现大多数教程难度跨度很大?或者说你找不到心仪的数据集? 本文将会通过 8 个小贴士帮助你走进文本挖掘之门. 对文本保持好奇 在数据科学世界中,凡事的第一步都是"感到好奇",文本挖掘也不例外. 就像 StackOverflow 的数据科学家 David Robinson 在他的博客中说的那样,"当我看到一个假设 [-] 我就迫不及待地想要用数据验证它".你也应该像他那样对文本保持好奇心. David Robinson 看到的假设是: 即使你并不

R 和 Python 中的文本挖掘:8 个入门小贴士

你希望学习文本挖掘,却发现大多数教程难度跨度很大?或者说你找不到心仪的数据集? 本文将会通过 8 个小贴士帮助你走进文本挖掘之门. 对文本保持好奇 在数据科学世界中,凡事的第一步都是"感到好奇",文本挖掘也不例外. 就像 StackOverflow 的数据科学家 David Robinson 在他的博客中说的那样,"当我看到一个假设 [-] 我就迫不及待地想要用数据验证它".你也应该像他那样对文本保持好奇心. David Robinson 看到的假设是: 即使你并不

SAS与R优缺点讨论:从工业界到学界

摘要 尽管在工业界还是被 SAS 所统治但是 R 在学术界却得到广泛的应用因为其免费.开源的属性使得用户们可以编写和分享他们自己的应用.然而许多正在获得数据分析相关学位的学生们由于缺乏 SAS 经验的情况而在找工作的路上困难重重与此同时他们要面对从学校熟悉的 R向 SAS 转型的痛苦.理想情况是你需要知道所有可能的编程语言工作的时候使用与工作情况最匹配的那个当然这个基本上是痴人说梦.我们的目的就是展示这两种差异巨大的语言各自优点并且共同发挥他们的优势我们同时还要指出那些不使用 SAS 好多年的.

想用R和Python做文本挖掘又不知如何下手?方法来了!

1.对文章产生好奇 在数据科学中,几乎做所有事情的第一步都是产生好奇,文本挖掘也不例外. 文本挖掘应用领域无比广泛,可以与电影台本.歌词.聊天记录等产生奇妙的化学反应:如南方公园的对话,电影对白的文本挖掘和分析等也都是受到了文本挖掘的启发:近期大数据文摘相关文章<从恋爱到婚后的短信词频图发生了这些变化,你中了几枪?>带各位分析了聊天记录中隐藏的文本信息:而对各类歌词的文本信息分析,也颇有意思.(点击查看<这四十年来的香港歌坛在唱些什么>.<分析了42万字的歌词,为了搞清楚民谣

《 营销数据科学: 用R和Python进行预测分析的建模技术》——导读

前 言 "人总会失去养育自己的一切,自然界的事物莫不如此.勇敢的人总是从容应对,静观其变,而不会溜之大吉." --2012年美国电影<南方的野兽>中赫什帕皮(奎温简妮·沃利斯饰) 以前的市场营销教材编写者总会推广"营销理念",说营销既不是销售也不是买卖,而是去了解和满足顾客需求.他们往往把"营销研究(市场调查)"和"市场研究"区分开,把前者列为一门商业学科,而把后者划入经济学的范畴."营销研究"

rpy2 r语言-python安装rpy2出现问题

问题描述 python安装rpy2出现问题 求告知什么问题,非常感激 解决方案 http://bbs.chinaunix.net/thread-3582026-1-1.html 解决方案二: 你是不是下载的windows版本的安装包 解决方案三: 在linux下python安装rpy2 包[Python与R]Rpy2 - 安装python导入文件出现问题

数据分析师的必读书单

有不少人留言希望我推荐数据分析的书单,刚好即将春节,无论是假日学习还是年后,都值得充电.读书最好的时候是学生时期,其次是现在.内容按照 <如何七周成为数据分析师 > 的顺序. 数据分析是一门专业且跨越多个领域的学科,虽然我每篇公众号都足够篇幅(乃至我自己觉得啰嗦),可我还是得承认存在缺漏.如果有好书作为参考,对数据分析能力的成长更有帮助. 这份书单权作入门级推荐,如果大家有更好的欢迎留言说明.我不能保证全部看过,毕竟基础书没必要看几本,但我尽量做到客观.建议大家根据自己基础挑选,不要贪多. 大