The field of data science is constantly evolving and ever-advancing, with new technologies placing more valuable insights in the hands of modern enterprises. More data-driven organizations are hiring data scientists to drive their efforts to gather, analyze, and make use of Big Data in valuable ways.
Because the field of data science is so broad and sometimes challenging to navigate, we’ve compiled a list of 50 of the most helpful data science resources on the web. Whether you’re a student or new professional working in the field of data science, these resources are valuable for discovering the latest employment opportunities, finding tutorials for the processes and systems you’re using on a daily basis, learning hacks and tricks to boost your performance, and connecting with other professionals in your field.
Note: The following 50 resources are not ranked or rated in order of importance or value; rather, they are categorized to make it easy for you to locate the resources you need most. Click through to a specific category using the links in the Table of Contents below.
Table of Contents:
Edwin Chen is a San Francisco Bay-area data scientist who has worked for companies like Dropbox, Microsoft, and Clarium Capital Management and studied at MIT. Chen blogs on topics of interest to data scientists, including tutorials on crowdsourcing, modeling, moving beyond CTR with human evaluation, and more.
Three posts we like from Edwin Chen:
John Langford, Director of Learning at Microsoft Research, manages this collaborative machine learning blog. Langford shares his knowledge and personal insights on learning theory, covers conferences and related events, and discusses everything from neuroscience to prediction theory, problems, reduction, and of course, machine learning.
Three posts we like from Machine Learning (Theory):
FastML is “meant to tackle interesting topics in machine learning while being entertaining and easy to read and understand.” Run by Zygmunt Zając, FastML was born from a frustration with papers and documentation that aren’t easily understood by the average user who lacks both the time and interest in becoming a PhD-level expert in every machine learning topic. In other words, FastML breaks down technical material in an easy-to-understand manner.
Three posts we like from FastML:
Statistical Modeling, Causal Inference, and Social Science is a blog run by Andrew Gelman, a professor of statistics and political science and director of the Applied Statistics Center at Columbia University.Topics include causal inference, decision theory, multilevel modeling, statistical computing, and statistical graphs, as well as other topics of interest to Gelman such as public health, sociology, and political science.
Three posts we like from Statistical Modeling, Causal Inference, and Social Science:
Mike Croucher, Head of the EPS IS Applications team at the University of Manchester, shares his knowledge of a variety of programming languages and tools, spanning everything from linear algebra to MATLAB and Python.
Three posts we like from Walking Randomly:
Kaggle is turning data science into a sport with its platform for predictive modeling competitions. Kaggle’s competition and data science blog, No Free Hunch, covers all things related to the sport of data science.
Three posts we like from No Free Hunch:
Data Mining Research, originally started in 2006, covers research and applications in data mining. Sandro Saitta first started the blog as a PhD student at EPFL (Ecole Polytechnique Fédérale de Lausanne), Switzerland, at which time he discussed data mining research issues.
Three posts we like from Data Mining Research:
SmartData Collective is an online community moderated by Social Media Today that provides information on the latest trends in business intelligence and data management. SmartData Collective serves as a platform for recognized, global experts to share their expertise and insights.
Three posts we like from SmartData Collective:
Gil Press has been involved in researching the growth of Big Data for years, and What’s the Big Data? is his blog focused on that very subject. Press spent more than two decades managing research, marketing, and communications projects and programs at NORC, DEC, and EMC. He now runs his own consulting practice and continues to blog at What’s the Big Data?, sharing his knowledge of the world of Big Data with readers.
Three posts we like from What’s the Big Data?:
Mining the Social Web is “transforming curiosity into insight.” The blog is a companion to a book by the same name, with the goal of taking social web mining mainstream. Find tutorials, analyses, hacks, excerpts from the book, and more.
Three posts we like from Mining the Social Web:
A former data scientist at Bit.ly, Hilary Mason is the Founder of Fast Forward Labs and Data Scientist in Residence at Accel. A self-proclaimed “data scientist and hacker,” Mason blogs about all things data, her experiences speaking and presenting on the subject, and more.
Three posts we like from Hilary Mason:
Steve Miller blogs at Information Management, covering data science, predictive analytics, statistical learning, and the impacts of data science on economics and public policy.
Three posts we like from Steve Miller’s Blog:
Renee documents her path from “SQL Data Analyst pursuing an Engineering Master’s Degree” to “Data Scientist” at Becoming a Data Scientist, providing a valuable resource to those interested in pursuing data science as a career.
Three posts we like from Becoming a Data Scientist:
The U.C. Berkeley School of Information runs the datascience@berkeley blog, featuring interviews, data science startups, event coverage, and other insights on data science and information technology.
Three posts we like from datascience@berkeley:
John Foreman is a data scientist at MailChimp, blogging his thoughts on data science as a profession and the state of the analytics industry as a whole.
Three posts we like from John Foreman, Data Scientist:
FlowingData explores the ways in which data scientists, designers, and statisticians use analysis, visualization, and exploration to understand data and ourselves. Dr. Nathan Yau, Ph.D., authors the FlowingData blog, presenting data on concepts that help readers understand the world around them, such as trends on transportation, relationships, and more.
Three posts we like from FlowingData:
Three biostatistics professors (Jeff Leek, Roger Peng, and Rafa Irizarry) who are fired up about the new era where data are abundant and statisticians are scientists blog at Simply Statistics, where they post ideas on interesting subject matter, contribute to discussion of science and popular writing, share informative articles, and offer advice to up-and-coming statisticians.
Three posts we like from Simply Statistics:
The Data Science Institute at Columbia University maintains an ongoing blog to reflect on and add to discussions on key topics from the Introduction to Data Science course. Featuring weekly lectures, course topics and themes, thought experiments, and more, the blog is an informative read for both students and professionals alike.
Three posts we like from Introduction to Data Science, Columbia University:
Insight Data Science is an intensive, six-week post-doctoral fellowship program bridging the gap between academia and data science. The Insight Data Science blog updates readers on the latest happenings with the program, in addition to offering informative data analyses, industry news, and tips for professionals in the data science field.
Three posts we like from Insight Data Science:
The Data Science Report is the official blog and website of Starbridge Partners, an executive search and career advisory firm focused on the data science and Big Data analytics space. The Data Science Report has it all, from case studies and papers to online courses, lectures, and webinars, as well as ongoing news and discussion of happenings in the world of data science.
Three posts we like from Data Science Report:
Galvanize is a network of life-long learners and educators, a home for developers, data scientists, entrepreneurs, and anyone with an interest in the space. Through Galvanize, you can connect with your peers, get advice from expert mentors, attend workshops, or enroll in a gSchool program.
Three resources we like from Galvanize:
Data Science Central is one of the premier online communities for those immersed in the data science culture. Read blog posts from other members, participate in forum discussions, and stay abreast of the latest research on Data Science Central.
Three resources we like from Data Science Central:
From courses, education, and meetings, to news, features, and interviews, publications, and webcasts, KDnuggets is a comprehensive resource for anyone with a vested interest in the data science community, whether a student in pursuit of professional goals or a working professional whose role is impacted by data science.
Three resources we like from KDnuggets:
Quora is a popular question-and-answer site where anyone can ask questions, engage in discussion, and provide expertise on virtually any topic. The Quora Data Science community focuses on “the scientific approach to knowledge extraction from data.”
Three resources we like from Quora – Data Science:
25. Cross Validated
A question-and-answer site for statistical topics, machine learning, data analysis, data mining, and data visualization, Cross Validated is a free resource for data scientists and those interested in the field.
Three resources we like from Cross Validated:
A non-profit professional group offering education, professional certification, conferences and meetups, and even a “Data Science Code of Professional Conduct,” the Data Science Association is a valuable community for data science professionals.
Three resources we like from Data Science Association:
The Open Source Data Science Masters is an open-source curriculum for learning data science. “Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to make data useful.”
Three resources we like from The Open Source Data Science Masters:
An open content portal for self-directed learning in data science, Learn Data Science was developed primarily by Nitin Borwankar, a seasoned database professional with more than two decades of experience.
Three resources we like from Learn Data Science:
DataCamp is a resource for learning data analysis and R interactively. With DataCamp, you can learn by doing, at your own pace, choosing from a variety of courses.
Three courses we like from DataCamp:
Data Science Academy is a useful portal for finding free educational resources on data science and related topics. From Berkeley to MIT, Columbia and Stanford, you’ll find free data science resources from major educational institutions.
Three resources we like from Data Science Academy:
School of Data “works to empower civil society organizations, journalists, and citizens with the skills they need to use data effectively.” Joining School of Data gives you access to a variety of courses designed for everyone, from the data science-newbie to the professional seeking inspiration.
Three resources we like from School of Data:
32. Neural Information Processing Systems Foundation (NIPS)
December 7-12, 2015
Montreal, Quebec, Canada
The Neural Information Processing Systems (NIPS) Foundation, a non-profit corporation, hosts its annual conference to give attendees the opportunity to exchange research on neural information processing systems in the areas of biology, technology, mathematics, and theory. Featuring tutorials on December 7, conference sessions December 7-10, and workshops December 11-12, NIPS 2015 will be part of the foundation’s continuing series of professional meetings and is a highly regarded data science resource.
Cost to Attend: Contact for attendance cost
33. International Conference on Machine Learning (ICML)
July 6-11, 2015
The 32nd International Conference on Machine Learning (ICML), supported by the International Machine Learning Society (IMLS), includes two days of tutorials, two main conference days, and two days of workshops. Invited speakers include Léon Bottou of Microsoft Research New York, Jon Kleinberg of Cornell University, and Susan Murphy of the University of Michigan. A popular and informative event, ICML is a great data science resource.
Cost to Attend: Contact for attendance cost
The Data Science Forum holds meetings that focus on hot data science topics, machine learning, and business analytics. Free and open to all, the interactive summits join professionals and academics together and provide a place for networking and the sharing of ideas and information.
Billed as “the industry’s only forum dedicated to discussing successful integration of your organization’s largest pools of data to maximize research and development efforts and minimize cost,” Data & Analytics in Life Sciences Forum showcases clinical trials data, real world data, communicable diseases data, and genomic/personalized medicine data. This data science resource will help attendees utilize their organizations’ data to enhance their brands.
Cost to Attend: Discounts are available for teams
Joseph Rickert presents this webinar on R and Data Science that is an appropriate resource both for beginner and experienced R users. The webinar features code examples and reasons for the popularity and effectiveness of R, making it a useful data science resource.
Three key points we like from R-bloggers: R and Data Science Webinar:
This webinar features Bora Beran, program manager of Tableau Software, and explores how to use Tableau with R to speed up data science projects to result in better, data-driven business decisions. An on-demand webinar, Statistical Computing – R and Tableau: Data Science at the Speed of Thought is 62 minutes in length.
Three key topics we like from Statistical Computing – R and Tableau: Data Science at the Speed of Thought:
Cost: Free, with registration form completion
This webcast features Annika Jimenez, Kaushik Das, and Hulya Farinas, leaders of the Pivotal team, and their insights on the data science industry trends that will be key for 2015. These industry leaders and their insights create a data science resource that is not to be missed.
Three key topics we like from Top Data Science Trends for 2015:
Presented by Colin White, Presdient of BI Research, this webinar explores the idea that data scientists are invaluable to organizations because they are analysts who play the role of data engineer, statistician, and business analyst. In this regard, data scientists are the link between simply doing advanced data analysis and actually using the findings to produce business results aligned with the organization’s goals. This webinar serves as a terrific data science resource.
Three key topics we like from How Data Science is Changing the Way Companies Do Business:
Cost: Contact for registration and viewing fees
From Maintenance Net, a leader in data-driven science revenue generation, this webinar is a data science resource geared toward service revenue. The webinar explores strategies for establishing sound data intelligence practices while growing your services sales business.
Three key topics we like from The Data Science Behind Service Revenue:
Cost: Contact for registration information
Part of the Data Science Central Webinar Series, Using Data Science Techniques for Automatic Clustering of IT Infrastructure Alerts is hosted by Tim Matteson of DSC. The webinar showcases a framework developed to automatically cluster alerts that report on the health of various critical components, making it an informative and useful data science resource.
Three key points we like from Using Data Science Techniques for Automatic Clustering of IT Infrastructure Alerts:
This little more than hour-long video features an expert panel that explores the data science revolution that is occurring. Considerations of the future of data science and the ethics involved with data analytics and enhanced predictive powers are just two captivating issues that arise in the video, making it an intriguing data science resource.
Three key points we like from The Data Science Revolution:
A video from RStudio, which provides “open source and enterprise-ready professional software for the R community,” Introduction to Data Science with R Video Workshop features RStudio Master Instructor Garrett Grolemund. A short video introduction for the larger Introduction to Data Science with R Video Course, this data science resource gives a thorough overview of the course.
Three key topics we like from Introduction to Data Science with R Video Workshop:
This data science resource is a detailed talk delivered by Donald Miner to the NYC Data Science Meetup about using Hadoop for data science. The best feature about this video is that Mortar Data has included a time-stamped summary so viewers may skip to specific sections if they so desire.
Three key topics we like from Hadoop for Data Science:
Most individuals in the data science field are familiar with Booz Allen Hamiltion, the group that provides management and tech consulting to the government, major corporations and institutions, and non-profit organizations. Their nearly four-and-a-half-minute video focuses on the opportunity their clients have when dealing correctly with their data and serves as a case study for data science professionals.
Three key points we like from Turning Big Data Into Big Analytics: Data Science:
This video showcases an excerpt of a recorded interview with Jason Huling, data scientist at MaintenanceNet. With a central focus on using data science in sales, the video makes the case for having the necessary data, tools, and resources to use data science productively and efficiently.
Three key points we like from The Importance of Data Science in Sales:
The Berkeley Institute for Data Science is a comprehensive data science resource, because it provides research, various resources, and more than 10 videos relating to data science. Anyone looking for more information about data science is sure to find the Berkeley Institute for Data Science to be a great help, but we think their videos are some of the best choices for data science resources.
Three data science videos we like from the Berkeley Institute for Data Science:
Made available by Wikibooks, Data Science: An Introduction is a Wikibooks that includes a basic introduction to data science. Geared toward advanced high school students or college freshmen with high-school level understandings of math, science, word processing, and spreadsheets, Data Science: An Introduction does not require a computer science background, making it an extremely accessible data science resource.
Three key points we like from Data Science: An Introduction:
Kaggle is a platform for predictive modeling competitions and consulting. Consult Kaggle’s Wiki for answers to all your frequently asked questions about data science and Kaggle’s competitions, look for professional opportunities on the job board, and participate in discussions with other users in the forum.
Three resources we like from Kaggle – Competitions:
A free weekly newsletter that features curated news, articles, and data science job openings, Data Science Weekly is a must-receive news source for data scientists and related professionals delivered to your inbox every Thursday.
Three resources we like from Data Science Weekly:
NGDATA helps data-rich companies in financial services, media/publishing and telecom to drive connected experiences. The company’s next generation customer data platform, Lily Enterprise™, puts people at the center of every business via Lily’s Customer DNA, which continuously learns from behavior to deliver compelling experiences.Learn More...