Fundamentals of data science : theory and practice / Jugal K. Kalita, Dhruba K. Bhattacharyya, Swarup Roy.
Saved in:
Main Authors: | , , |
---|---|
Format: | Book |
Language: | English |
Published: |
London, United Kingdom ; San Diego, CA :
Academic Press, an imprint of Elsevier,
[2024]
|
Subjects: |
Table of Contents:
- Front Cover
- Fundamentals of Data Science
- Copyright
- Contents
- Preface
- Acknowledgment
- Foreword
- Foreword
- 1 Introduction
- 1.1 Data, information, and knowledge
- 1.2 Data Science: the art of data exploration
- 1.2.1 Brief history
- 1.2.2 General pipeline
- 1.2.2.1 Data collection and integration
- 1.2.2.2 Data preparation
- 1.2.2.3 Learning-model construction
- 1.2.2.4 Knowledge interpretation and presentation
- 1.2.3 Multidisciplinary science
- 1.3 What is not Data Science?
- 1.4 Data Science tasks
- 1.4.1 Predictive Data Science
- 1.4.2 Descriptive Data Science
- 1.4.3 Diagnostic Data Science
- 1.4.4 Prescriptive Data Science
- 1.5 Data Science objectives
- 1.5.1 Hidden knowledge discovery
- 1.5.2 Prediction of likely outcomes
- 1.5.3 Grouping
- 1.5.4 Actionable information
- 1.6 Applications of Data Science
- 1.7 How to read the book?
- References
- 2 Data, sources, and generation
- 2.1 Introduction
- 2.2 Data attributes
- 2.2.1 Qualitative
- 2.2.1.1 Nominal
- 2.2.1.2 Binary
- 2.2.1.3 Ordinal
- 2.2.2 Quantitative
- 2.2.2.1 Discrete
- 2.2.2.2 Continuous
- 2.2.2.3 Interval
- 2.2.2.4 Ratio
- 2.3 Data-storage formats
- 2.3.1 Structured data
- 2.3.2 Unstructured data
- 2.3.3 Semistructured data
- 2.4 Data sources
- 2.4.1 Primary sources
- 2.4.2 Secondary sources
- 2.4.3 Popular data sources
- 2.4.4 Homogeneous vs. heterogeneous data sources
- 2.5 Data generation
- 2.5.1 Types of synthetic data
- 2.5.2 Data-generation steps
- 2.5.3 Generation methods
- 2.5.4 Tools for data generation
- 2.5.4.1 Software tools
- 2.5.4.2 Python libraries
- 2.6 Summary
- References
- 3 Data preparation
- 3.1 Introduction
- 3.2 Data cleaning
- 3.2.1 Handling missing values
- 3.2.1.1 Ignoring and discarding data
- 3.2.1.2 Parameter estimation
- 3.2.1.3 Imputation
- 3.2.2 Duplicate-data detection
- 3.2.2.1 Knowledge-based methods
- 3.2.2.2 ETL method
- 3.3 Data reduction
- 3.3.1 Parametric data reduction
- 3.3.2 Sampling
- 3.3.3 Dimensionality reduction
- 3.4 Data transformation
- 3.4.1 Discretization
- 3.4.1.1 Supervised discretization
- 3.4.1.2 Unsupervised discretization
- 3.5 Data normalization
- 3.5.1 Min-max normalization
- 3.5.2 Z-score normalization
- 3.5.3 Decimal-scaling normalization
- 3.5.4 Quantile normalization
- 3.5.5 Logarithmic normalization
- 3.6 Data integration
- 3.6.1 Consolidation
- 3.6.2 Federation
- 3.7 Summary
- References
- 4 Machine learning
- 4.1 Introduction
- 4.2 Machine Learning paradigms
- 4.2.1 Supervised learning
- 4.2.2 Unsupervised learning
- 4.2.3 Semisupervised learning
- 4.3 Inductive bias
- 4.4 Evaluating a classifier
- 4.4.1 Evaluation steps
- 4.4.1.1 Validation
- 4.4.1.2 Testing
- 4.4.1.3 K-fold crossvalidation
- 4.4.2 Handling unbalanced classes
- 4.4.3 Model generalization
- 4.4.3.1 Underfitting
- 4.4.3.2 Overfitting
- 4.4.3.3 Accurate fittings