The field of data science is constantly evolving and ever-advancing, with new technologies placing more valuable insights in the hands of modern enterprises. More data-driven organizations are hiring data scientists to drive their efforts to gather, analyze, and make use of Big Data in valuable ways.
Because the field of data science is so broad and sometimes challenging to navigate, we’ve compiled a list of 50 of the most helpful data science resources on the web. Whether you’re a student or new professional working in the field of data science, these resources are valuable for discovering the latest employment opportunities, finding tutorials for the processes and systems you’re using on a daily basis, learning hacks and tricks to boost your performance, and connecting with other professionals in your field.
Note: The following 50 resources are not ranked or rated in order of importance or value; rather, they are categorized to make it easy for you to locate the resources you need most. Click through to a specific category using the links in the Table of Contents below.
Table of Contents:
- Best Data Science Blogs
- Data Science Communities
- Data Science Educational Resources
- Data Science Conferences
- Data Science Webinars
- Data Science Videos
- Miscellaneous Data Science Resources
Edwin Chen is a San Francisco Bay-area data scientist who has worked for companies like Dropbox, Microsoft, and Clarium Capital Management and studied at MIT. Chen blogs on topics of interest to data scientists, including tutorials on crowdsourcing, modeling, moving beyond CTR with human evaluation, and more.
Three posts we like from Edwin Chen:
- Moving Beyond CTR: Better Recommendations Through Human Evaluation
- Propensity Modeling, Causal Inference, and Discovering Drivers of Growth
- Making the Most of Mechanical Turk: Tips and Best Practices
John Langford, Director of Learning at Microsoft Research, manages this collaborative machine learning blog. Langford shares his knowledge and personal insights on learning theory, covers conferences and related events, and discusses everything from neuroscience to prediction theory, problems, reduction, and of course, machine learning.
Three posts we like from Machine Learning (Theory):
FastML is “meant to tackle interesting topics in machine learning while being entertaining and easy to read and understand.” Run by Zygmunt Zając, FastML was born from a frustration with papers and documentation that aren’t easily understood by the average user who lacks both the time and interest in becoming a PhD-level expert in every machine learning topic. In other words, FastML breaks down technical material in an easy-to-understand manner.
Three posts we like from FastML:
- Interactive in-browser 3D visualization of datasets
- Geoff Hinton’s Dark Knowledge
- Kaggle vs industry, as seen through lens of the Avito competition
Statistical Modeling, Causal Inference, and Social Science is a blog run by Andrew Gelman, a professor of statistics and political science and director of the Applied Statistics Center at Columbia University.Topics include causal inference, decision theory, multilevel modeling, statistical computing, and statistical graphs, as well as other topics of interest to Gelman such as public health, sociology, and political science.
Three posts we like from Statistical Modeling, Causal Inference, and Social Science:
- Six quick tips to improve your regression modeling
- Statistical Communication and Graphics Manifesto
- What do you do to visualize uncertainty?
Mike Croucher, Head of the EPS IS Applications team at the University of Manchester, shares his knowledge of a variety of programming languages and tools, spanning everything from linear algebra to MATLAB and Python.
Three posts we like from Walking Randomly:
- OpenDreamKit – A grant proposal written openly and collaboratively
- When software licensing cripples scientific throughput
- Scilab/xcos versions of Simulink models used in control theory teaching
Kaggle is turning data science into a sport with its platform for predictive modeling competitions. Kaggle’s competition and data science blog, No Free Hunch, covers all things related to the sport of data science.
Three posts we like from No Free Hunch:
- Kaggle InClass: Stanford’s “Getting a Handel on Data Science” Winners’ Report
- Reviewing the American Epilepsy Society Seizure Prediction Challenge
- Learning from the best
Data Mining Research, originally started in 2006, covers research and applications in data mining. Sandro Saitta first started the blog as a PhD student at EPFL (Ecole Polytechnique Fédérale de Lausanne), Switzerland, at which time he discussed data mining research issues.
Three posts we like from Data Mining Research:
- The 5 Most Common Data Relationships Shown Through Visualization
- Prescriptive Analytics: New Concept or Buzzword?
- Predicting Kickstarter campaign success through data mining
SmartData Collective is an online community moderated by Social Media Today that provides information on the latest trends in business intelligence and data management. SmartData Collective serves as a platform for recognized, global experts to share their expertise and insights.
Three posts we like from SmartData Collective:
- Why Are Mid-Market Companies Waiting to Embrace Big Data?
- How to Position Big Data
- Data Driven Marketing: A Real Life Use Case
Gil Press has been involved in researching the growth of Big Data for years, and What’s the Big Data? is his blog focused on that very subject. Press spent more than two decades managing research, marketing, and communications projects and programs at NORC, DEC, and EMC. He now runs his own consulting practice and continues to blog at What’s the Big Data?, sharing his knowledge of the world of Big Data with readers.
Three posts we like from What’s the Big Data?:
- A Visual Guide to the Startup Universe
- Being a Data Scientist in 2015 (Infographic)
- Here Comes Another Bubble: 2014 Tech M&A Volume and Value Surpassed Only in 2000
Mining the Social Web is “transforming curiosity into insight.” The blog is a companion to a book by the same name, with the goal of taking social web mining mainstream. Find tutorials, analyses, hacks, excerpts from the book, and more.
Three posts we like from Mining the Social Web:
- What Do Tim O’Reilly, Lady Gaga, and Marissa Mayer All Have In Common?
- How to Deliver a Successful Tech Workshop with Vagrant and AWS
- How To Mine Your GMail with Google Takeout and MongoDB
A former data scientist at Bit.ly, Hilary Mason is the Founder of Fast Forward Labs and Data Scientist in Residence at Accel. A self-proclaimed “data scientist and hacker,” Mason blogs about all things data, her experiences speaking and presenting on the subject, and more.
Three posts we like from Hilary Mason:
- What Mugshots Mean For Public Data
- Need actual random numbers? Meet the NIST randomness beacon.
- Data Engineering
Steve Miller blogs at Information Management, covering data science, predictive analytics, statistical learning, and the impacts of data science on economics and public policy.
Three posts we like from Steve Miller’s Blog:
- Predictive Analytics or Data Science?
- Cheat Sheets for Data Science
- Defining Data Scientists & Their Tools
Renee documents her path from “SQL Data Analyst pursuing an Engineering Master’s Degree” to “Data Scientist” at Becoming a Data Scientist, providing a valuable resource to those interested in pursuing data science as a career.
Three posts we like from Becoming a Data Scientist:
- Data Science Practice: Classifying Heart Disease
- My “What is Data Science?” Talk
- Doing Data Science (Review)
The U.C. Berkeley School of Information runs the datascience@berkeley blog, featuring interviews, data science startups, event coverage, and other insights on data science and information technology.
Three posts we like from datascience@berkeley:
- Eric Berlow and Sean Gourley: Mapping Ideas Worth Spreading
- Data Dialogs Interview: Motiga’s Kimberly Stedman
- What Is Big Data?
John Foreman is a data scientist at MailChimp, blogging his thoughts on data science as a profession and the state of the analytics industry as a whole.
Three posts we like from John Foreman, Data Scientist:
- Surviving Data Science “at the Speed of Hype”
- The Perilous World of Machine Learning for Fun and Profit: Pipeline Jungles and Hidden Feedback Loops
- Facebook’s solution to big data’s “content problem:” dumber users
FlowingData explores the ways in which data scientists, designers, and statisticians use analysis, visualization, and exploration to understand data and ourselves. Dr. Nathan Yau, Ph.D., authors the FlowingData blog, presenting data on concepts that help readers understand the world around them, such as trends on transportation, relationships, and more.
Three posts we like from FlowingData:
- Shrinking middle class
- Interactive: When Do Americans Leave For Work?
- Conflicting views: Public versus scientists
Three biostatistics professors (Jeff Leek, Roger Peng, and Rafa Irizarry) who are fired up about the new era where data are abundant and statisticians are scientists blog at Simply Statistics, where they post ideas on interesting subject matter, contribute to discussion of science and popular writing, share informative articles, and offer advice to up-and-coming statisticians.
Three posts we like from Simply Statistics:
- Is Reproducibility as Effective as Disclosure? Let’s Hope Not.
- The trouble with evaluating anything
- Johns Hopkins Data Science Specialization Top Performers
The Data Science Institute at Columbia University maintains an ongoing blog to reflect on and add to discussions on key topics from the Introduction to Data Science course. Featuring weekly lectures, course topics and themes, thought experiments, and more, the blog is an informative read for both students and professionals alike.
Three posts we like from Introduction to Data Science, Columbia University:
- Mapping Data to Senses
- Philosophy of Data Science: Embrace the Practical and the Profound
- The Stars of Data Science
Insight Data Science is an intensive, six-week post-doctoral fellowship program bridging the gap between academia and data science. The Insight Data Science blog updates readers on the latest happenings with the program, in addition to offering informative data analyses, industry news, and tips for professionals in the data science field.
Three posts we like from Insight Data Science:
- From PhD to Data Scientist: 5 tips for Making the Transition
- ThisPlusThat: A Search Engine That Lets You ‘Add’ Words as Vectors
- Pick Your Fiction (And Your Career)
The Data Science Report is the official blog and website of Starbridge Partners, an executive search and career advisory firm focused on the data science and Big Data analytics space. The Data Science Report has it all, from case studies and papers to online courses, lectures, and webinars, as well as ongoing news and discussion of happenings in the world of data science.
Three posts we like from Data Science Report:
- The Complete List of Data Science Fellowships & Bootcamps
- Audio Interview: Hilary Mason on Taking Big Data from Theory to Reality
- Big Data and Predictive Analytics: When is Enough Data Enough?
Galvanize is a network of life-long learners and educators, a home for developers, data scientists, entrepreneurs, and anyone with an interest in the space. Through Galvanize, you can connect with your peers, get advice from expert mentors, attend workshops, or enroll in a gSchool program.
Three resources we like from Galvanize:
- gSchool classes
- We’ve partnered with Women Who Code to give a full scholarship to a woman in tech
- Galvanize Ventures
Data Science Central is one of the premier online communities for those immersed in the data science culture. Read blog posts from other members, participate in forum discussions, and stay abreast of the latest research on Data Science Central.
Three resources we like from Data Science Central:
- How to detect spurious correlations, and how to find the real ones
- The Free ‘Big Data’ Sources Everyone Should Know
- Data Science Cheat Sheet
From courses, education, and meetings, to news, features, and interviews, publications, and webcasts, KDnuggets is a comprehensive resource for anyone with a vested interest in the data science community, whether a student in pursuit of professional goals or a working professional whose role is impacted by data science.
Three resources we like from KDnuggets:
- 10 things statistics taught us about big data analysis
- Cartoon: Data Scientist gets 3 wishes for Valentine’s Day
- My Brief Guide to Big Data and Predictive Analytics for non-experts
Quora is a popular question-and-answer site where anyone can ask questions, engage in discussion, and provide expertise on virtually any topic. The Quora Data Science community focuses on “the scientific approach to knowledge extraction from data.”
Three resources we like from Quora – Data Science:
- What are the key skills of a data scientist?
- How to become a data scientist
- What can a data scientist create in 1 hour, 1 day, 1 week, or 1 month?
25. Cross Validated
A question-and-answer site for statistical topics, machine learning, data analysis, data mining, and data visualization, Cross Validated is a free resource for data scientists and those interested in the field.
Three resources we like from Cross Validated:
- Calculating significance and uplift on Revenue A/B Tests
- Statistical test for two variables (before after) for each subject
- Is there any learning method resistant to the curse of dimensionality?
A non-profit professional group offering education, professional certification, conferences and meetups, and even a “Data Science Code of Professional Conduct,” the Data Science Association is a valuable community for data science professionals.
Three resources we like from Data Science Association:
- 2015-01-21 Top 20 Data Quality Solutions & Random Walks for Scale Space Theory
- International Journal of Data Science – Call for Papers – Deadline March 1, 2015 for Next Issue
- Data Science News
The Open Source Data Science Masters is an open-source curriculum for learning data science. “Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to make data useful.”
Three resources we like from The Open Source Data Science Masters:
An open content portal for self-directed learning in data science, Learn Data Science was developed primarily by Nitin Borwankar, a seasoned database professional with more than two decades of experience.
Three resources we like from Learn Data Science:
- A1. Linear Regression – Overview
- B2. Logistic Regression – Data Exploration
- C3. Random Forests – Analysis
DataCamp is a resource for learning data analysis and R interactively. With DataCamp, you can learn by doing, at your own pace, choosing from a variety of courses.
Three courses we like from DataCamp:
- A hands-on introduction to statistics with R
- Data analysis the data.table way
- Data Analysis and Statistical Inference
Data Science Academy is a useful portal for finding free educational resources on data science and related topics. From Berkeley to MIT, Columbia and Stanford, you’ll find free data science resources from major educational institutions.
Three resources we like from Data Science Academy:
- Introduction to Data Mining at Massachusetts Institute of Technology
- Introduction to Data Wrangling at the School of Data
- Introduction to Data Science by Jeff Hammerbacher at UC, Berkeley
School of Data “works to empower civil society organizations, journalists, and citizens with the skills they need to use data effectively.” Joining School of Data gives you access to a variety of courses designed for everyone, from the data science-newbie to the professional seeking inspiration.
Three resources we like from School of Data:
32. Neural Information Processing Systems Foundation (NIPS)
December 7-12, 2015
Montreal, Quebec, Canada
The Neural Information Processing Systems (NIPS) Foundation, a non-profit corporation, hosts its annual conference to give attendees the opportunity to exchange research on neural information processing systems in the areas of biology, technology, mathematics, and theory. Featuring tutorials on December 7, conference sessions December 7-10, and workshops December 11-12, NIPS 2015 will be part of the foundation’s continuing series of professional meetings and is a highly regarded data science resource.
Cost to Attend: Contact for attendance cost
33. International Conference on Machine Learning (ICML)
July 6-11, 2015
The 32nd International Conference on Machine Learning (ICML), supported by the International Machine Learning Society (IMLS), includes two days of tutorials, two main conference days, and two days of workshops. Invited speakers include Léon Bottou of Microsoft Research New York, Jon Kleinberg of Cornell University, and Susan Murphy of the University of Michigan. A popular and informative event, ICML is a great data science resource.
Cost to Attend: Contact for attendance cost
The Data Science Forum holds meetings that focus on hot data science topics, machine learning, and business analytics. Free and open to all, the interactive summits join professionals and academics together and provide a place for networking and the sharing of ideas and information.
Billed as “the industry’s only forum dedicated to discussing successful integration of your organization’s largest pools of data to maximize research and development efforts and minimize cost,” Data & Analytics in Life Sciences Forum showcases clinical trials data, real world data, communicable diseases data, and genomic/personalized medicine data. This data science resource will help attendees utilize their organizations’ data to enhance their brands.
Cost to Attend: Discounts are available for teams
- Primary All Access – Main conference plus all 3 workshops: $2,499
- Primary Main Conference: $1,799
- Vendor All Access – Main conference plus all 3 workshops: $2,999
- Vendor Main Conference: $2,299
- Workshop A: $549
- Workshop B: $549
- Workshop C: $549
Joseph Rickert presents this webinar on R and Data Science that is an appropriate resource both for beginner and experienced R users. The webinar features code examples and reasons for the popularity and effectiveness of R, making it a useful data science resource.
Three key points we like from R-bloggers: R and Data Science Webinar:
- R makes numerous machine learning and statistical algorithms available
- R features visualization capabilities
- R is a rich programming language with several tools for data manipulation
This webinar features Bora Beran, program manager of Tableau Software, and explores how to use Tableau with R to speed up data science projects to result in better, data-driven business decisions. An on-demand webinar, Statistical Computing – R and Tableau: Data Science at the Speed of Thought is 62 minutes in length.
Three key topics we like from Statistical Computing – R and Tableau: Data Science at the Speed of Thought:
- Connecting R scripts to a wide variety of data files and databases
- Building interactive slideshows and presentations of your data in just minutes
- Using dashboards as a front end for R code and allowing viewers to intuitively interact with R models
Cost: Free, with registration form completion
This webcast features Annika Jimenez, Kaushik Das, and Hulya Farinas, leaders of the Pivotal team, and their insights on the data science industry trends that will be key for 2015. These industry leaders and their insights create a data science resource that is not to be missed.
Three key topics we like from Top Data Science Trends for 2015:
- New use cases at the vertical level
- Analytical tool usage trends
- Implications of the shift in focus to model operationalization
Presented by Colin White, Presdient of BI Research, this webinar explores the idea that data scientists are invaluable to organizations because they are analysts who play the role of data engineer, statistician, and business analyst. In this regard, data scientists are the link between simply doing advanced data analysis and actually using the findings to produce business results aligned with the organization’s goals. This webinar serves as a terrific data science resource.
Three key topics we like from How Data Science is Changing the Way Companies Do Business:
- The role data science plays in business analytics
- Data science techniques, technologies, and tools
- Gaining a competitive advantage through data science
Cost: Contact for registration and viewing fees
From Maintenance Net, a leader in data-driven science revenue generation, this webinar is a data science resource geared toward service revenue. The webinar explores strategies for establishing sound data intelligence practices while growing your services sales business.
Three key topics we like from The Data Science Behind Service Revenue:
- Transforming installed base data into actionable business intelligence
- Managing and exchanging data with your channel efficiently
- Using advanced analytics to close more renewal, cross-sell, and up-sell opportunities
Cost: Contact for registration information
Part of the Data Science Central Webinar Series, Using Data Science Techniques for Automatic Clustering of IT Infrastructure Alerts is hosted by Tim Matteson of DSC. The webinar showcases a framework developed to automatically cluster alerts that report on the health of various critical components, making it an informative and useful data science resource.
Three key points we like from Using Data Science Techniques for Automatic Clustering of IT Infrastructure Alerts:
- The thousands of alerts that are generated per day put a strain on IT departments
- The cluster framework should be used in conjunction with Pivotal Greenplum Database and utilize approaches from Graph Theory and Hierarchical Clustering
- The framework makes it possible for IT departments to understand and process data more efficiently
This little more than hour-long video features an expert panel that explores the data science revolution that is occurring. Considerations of the future of data science and the ethics involved with data analytics and enhanced predictive powers are just two captivating issues that arise in the video, making it an intriguing data science resource.
Three key points we like from The Data Science Revolution:
- Today’s analytic capabilities are advanced and powerful enough to transform approaches to global challenges and human activities
- Data scientists’ predictive powers have grown immensely
- There are important pros and cons associated with data science and its data analytics and predictive powers
A video from RStudio, which provides “open source and enterprise-ready professional software for the R community,” Introduction to Data Science with R Video Workshop features RStudio Master Instructor Garrett Grolemund. A short video introduction for the larger Introduction to Data Science with R Video Course, this data science resource gives a thorough overview of the course.
Three key topics we like from Introduction to Data Science with R Video Workshop:
- Data science incorporates three skill sets: computer programming (with R), manipulating data sets, and modeling data with statistical methods
- R makes it possible to load, save, and transform data, plus generate graphs and fit statistical models to the data
- R is an alternative to Excel, SAS, and other software
- Contact for a quote for the entire Introduction to Data Science with R Video Course
This data science resource is a detailed talk delivered by Donald Miner to the NYC Data Science Meetup about using Hadoop for data science. The best feature about this video is that Mortar Data has included a time-stamped summary so viewers may skip to specific sections if they so desire.
Three key topics we like from Hadoop for Data Science:
- 4 reasons to use Hadoop for data science
- Evaluating data cleanliness
- R and Hadoop
Most individuals in the data science field are familiar with Booz Allen Hamiltion, the group that provides management and tech consulting to the government, major corporations and institutions, and non-profit organizations. Their nearly four-and-a-half-minute video focuses on the opportunity their clients have when dealing correctly with their data and serves as a case study for data science professionals.
Three key points we like from Turning Big Data Into Big Analytics: Data Science:
- Big data yields big insights
- Data is the most valuable natural resource for companies and organizations
- Corporations face challenges and opportunities with analyzing their big data and transforming it into analysis and actionable insights
This video showcases an excerpt of a recorded interview with Jason Huling, data scientist at MaintenanceNet. With a central focus on using data science in sales, the video makes the case for having the necessary data, tools, and resources to use data science productively and efficiently.
Three key points we like from The Importance of Data Science in Sales:
- The overflow of data results in the necessity of organizations using data science to get value, insights, and opportunity from their existing data
- With data science, companies get clearer insight into customer behaviors, product usage, and buying trends
- The great benefit of machine learning is the ability account for a nearly limitless amount of scenarios
The Berkeley Institute for Data Science is a comprehensive data science resource, because it provides research, various resources, and more than 10 videos relating to data science. Anyone looking for more information about data science is sure to find the Berkeley Institute for Data Science to be a great help, but we think their videos are some of the best choices for data science resources.
Three data science videos we like from the Berkeley Institute for Data Science:
- Extracting Actionable Insight from Dirty Time-Series Data
- A Case Study in Reproducible Data Science: Measuring and Modeling Human Brain Connectivity
- Data Science Lecture Series: Maximizing Human Potential Using Machine Learning-Driven Applications
Made available by Wikibooks, Data Science: An Introduction is a Wikibooks that includes a basic introduction to data science. Geared toward advanced high school students or college freshmen with high-school level understandings of math, science, word processing, and spreadsheets, Data Science: An Introduction does not require a computer science background, making it an extremely accessible data science resource.
Three key points we like from Data Science: An Introduction:
- The most successful data scientists adopt holistic attitudes toward data science
- Data science requires proficiency in parallel processing, map-reduce computing, machine learning, advanced statistics, and complexity science, among other advanced skills and knowledge sets
- Data science is a collaborative effort and is best performed in team situations
Kaggle is a platform for predictive modeling competitions and consulting. Consult Kaggle’s Wiki for answers to all your frequently asked questions about data science and Kaggle’s competitions, look for professional opportunities on the job board, and participate in discussions with other users in the forum.
Three resources we like from Kaggle – Competitions:
A free weekly newsletter that features curated news, articles, and data science job openings, Data Science Weekly is a must-receive news source for data scientists and related professionals delivered to your inbox every Thursday.
Three resources we like from Data Science Weekly:
- Building a Data Science “Experiment Platform”: Nick Elprin Interview
- Data Science Resources
- Anomalies, Concerts & Data Science at the Command Line: Jeroen Janssens Interview