Hello, we will discuss where computation sits in relation to the traditional scientific modes of experiment and theory and some reasons why the computational and data sciences require a new and different set of skills. Scientific inquiry operated in two complementary modes, experiment, and theory, in the time periods following the scientific revolution. Doing work in experiment mode refers to the systematic recording of empirical observations in a controlled environment.
Doing work in theory mode refers to building general frameworks to explain and predict natural phenomena by building mathematical models and deriving general laws. The connection between theory and experiment is a two-way street, with theory connecting and ordering experimental observations and experiments serving as the crucial test for existing theories and pointing to. Where theoretical work should be directed next.
The theory – experiment paradigm continued in this way until the invention of digital electronic computers in the twentieth century. Just as they do today, early digital computers allowed scientists to perform numerical computations much faster than could be done by hand. Calculations that would take a full day to grind through by hand could be completed within less than a minute, opening up a new frontier based on computation-based research.
Developing Algorithms Computer
This did not happen overnight, and for much of the twentieth-century theoretician’s leverage computation as a powerful new tool in their research toolbox. As computing hardware became cheaper and more powerful and software became more user-friendly, computation’s role in science evolved as well. Significant challenges emerged as scientists started developing algorithms computer could solve that would encode known physical laws. Numerical simulations based on physical laws took on a role similar to table-top experiments.
The skills needed to build, optimize, and maintain scientific software or to program and administer a supercomputer diverged from the theory mode. Computational research adopted traits from both the experiment and theory modes of science. In this context computational science emerged as a third mode of science and as a complement to both theory and experiment. So what defines the skillset of a computational scientist that distinguishes it from the theory and experiment modes? Venn diagrams like this one are a popular way to illustrate how computational science operates in the overlap of multiple disciplines.
Computational science requires knowledge of applied mathematics and numerical methods, which provides the necessary tools to numerically solve different classes of models and simulations. It also requires knowledge of computer science when translating numerical methods into computational algorithms. Finally, it requires knowledge of a specific science discipline when building a simulation and evaluating its outputs. If this point you’re wondering, “how does data science fit into this paradigm”, you would be correct that it’s not clear. A large part of this is due to computational science-dating data science.
Data science emerged as data sets became larger, data storage and retrieval technology improved, and interest grew in developing algorithms that can “learn” patterns in data and make accurate predictions. The skills needed to do this kind of work resemble those in computational science but differ in important ways as you can see in the Venn diagram. The primary mathematical influence in data science is statistics and visualization, which provides the necessary tools to build data-driven learning models, explore data, and communicate results.
Data science also draws influence from a subset of computer science research, including machine learning, classification, and databases. Finally, data science also requires knowledge of a specific science discipline along with access to experimental data, which allows the data scientist to understand the context and implications of their analysis. So now we return to our diagram of the scientific modes, where we add in data science and redraw the connections. We now imagine the modes participating in the scientific cycle. The cycle can begin with an experiment that produces new data that is handed off to data scientists.
The data scientists find patterns in the data, which enhances the information available to theoreticians as they build new and more accurate mathematical models. Computational scientists can then take these models and convert them into numerical simulations, generating quantitative predictions for experimental researchers to test. The cycle then begins anew, resulting in a feedback loop. Looming in the background of this discussion is the challenge of big data. The term has become a buzzword in recent years and is sometimes treated as a synonym for data science.
The following quote does a good job defining what big data is and the challenges related to it. “big data [refers to] data sets that are so big and complex that traditional data-processing application software [is] inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, there are a number of concepts associated with big data: originally there were 3 concepts volume, variety, velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.”
Computational And Data Sciences
By now, we’ve seen how the computational and data sciences sit in relation to other modes of science and identified the skills that the computational and data sciences demand. To close, let’s review how you can benefit from learning basic skills in the computational and data sciences, even if your major and career are not part of this field. If you plan to pursue a career operating in the theory or experiment modes of science, consider the following points: a 2009 survey of scientific researchers found that, on average, they devote as much as 30% of their time developing and40% of their time using scientific software,
Even though many undergraduate natural science programs do not integrate computational skills into the curriculum. This lack of computational skills increases the risk of computational errors, which hurt computational reproducibility and can invalidate a study’s conclusions. In addition, as scientists are asked to collect, process, and analyze larger data sets, it becomes ever more important to adopt tools from the computational and data sciences to set up an automated workflow with real-time error-checking. Then, when you’re ready to share your hard work with the world, including-experts, it helps to know the basic principles for creating data visualizations and to have lots of practice using data to tell a clear and compelling story.
More broadly, if you are planning to pursue a career in a field that makes use of any kind of digital information, consider these points: data cleaning and organizing is a prerequisite to data analysis, so there are tangible benefits to keeping your information tidy. In addition, many computational tasks, regardless of discipline, map onto workflows where you apply a series of data transformations in a certain order. Trying to do this in an excel spreadsheet is tedious and error-prone, but you can streamline this process by adopting the tools and methods we’ll learn about during the semester. Data science methods are being applied to many fields, including medicine, the humanities, political science, law, and the list goes on.
Learning and applying these ideas gives you a head-start and, with some practice, can “supercharge” your own data-related work, helping you to stand out among your peers. Now that you’ve been briefed on what the computational and data sciences are all about, what will we focus on in CDs 101, given that the computational and data sciences are so broad? We will focus primarily on the tools, methods, and practices within the data science category.
If your interest was piqued in the computational science side of things, then I encourage you to look at the other courses offered by the CDs department, in particular CDs 130. The main topics that we will cover during this semester will be the following: learning a toolset that facilitates reproducible research, data visualization, data transformations, data cleaning, and reshaping, using statistical tools to interpret data distributions, inference and simulation, and modeling. If time permits, we will learn about a special topic, the basics of web scraping.