Fundamentals of data science : theory and practice / Jugal K. Kalita, Dhruba K. Bhattacharyya, Swarup Roy.

Saved in:
Bibliographic Details
Main Authors: Kalita, Jugal Kumar (Author), Bhattacharyya, Dhruba K. (Author), Roy, Swarup (Author)
Format: Book
Language:English
Published: London, United Kingdom ; San Diego, CA : Academic Press, an imprint of Elsevier, [2024]
Subjects:
Table of Contents:
  • Front Cover
  • Fundamentals of Data Science
  • Copyright
  • Contents
  • Preface
  • Acknowledgment
  • Foreword
  • Foreword
  • 1 Introduction
  • 1.1 Data, information, and knowledge
  • 1.2 Data Science: the art of data exploration
  • 1.2.1 Brief history
  • 1.2.2 General pipeline
  • 1.2.2.1 Data collection and integration
  • 1.2.2.2 Data preparation
  • 1.2.2.3 Learning-model construction
  • 1.2.2.4 Knowledge interpretation and presentation
  • 1.2.3 Multidisciplinary science
  • 1.3 What is not Data Science?
  • 1.4 Data Science tasks
  • 1.4.1 Predictive Data Science
  • 1.4.2 Descriptive Data Science
  • 1.4.3 Diagnostic Data Science
  • 1.4.4 Prescriptive Data Science
  • 1.5 Data Science objectives
  • 1.5.1 Hidden knowledge discovery
  • 1.5.2 Prediction of likely outcomes
  • 1.5.3 Grouping
  • 1.5.4 Actionable information
  • 1.6 Applications of Data Science
  • 1.7 How to read the book?
  • References
  • 2 Data, sources, and generation
  • 2.1 Introduction
  • 2.2 Data attributes
  • 2.2.1 Qualitative
  • 2.2.1.1 Nominal
  • 2.2.1.2 Binary
  • 2.2.1.3 Ordinal
  • 2.2.2 Quantitative
  • 2.2.2.1 Discrete
  • 2.2.2.2 Continuous
  • 2.2.2.3 Interval
  • 2.2.2.4 Ratio
  • 2.3 Data-storage formats
  • 2.3.1 Structured data
  • 2.3.2 Unstructured data
  • 2.3.3 Semistructured data
  • 2.4 Data sources
  • 2.4.1 Primary sources
  • 2.4.2 Secondary sources
  • 2.4.3 Popular data sources
  • 2.4.4 Homogeneous vs. heterogeneous data sources
  • 2.5 Data generation
  • 2.5.1 Types of synthetic data
  • 2.5.2 Data-generation steps
  • 2.5.3 Generation methods
  • 2.5.4 Tools for data generation
  • 2.5.4.1 Software tools
  • 2.5.4.2 Python libraries
  • 2.6 Summary
  • References
  • 3 Data preparation
  • 3.1 Introduction
  • 3.2 Data cleaning
  • 3.2.1 Handling missing values
  • 3.2.1.1 Ignoring and discarding data
  • 3.2.1.2 Parameter estimation
  • 3.2.1.3 Imputation
  • 3.2.2 Duplicate-data detection
  • 3.2.2.1 Knowledge-based methods
  • 3.2.2.2 ETL method
  • 3.3 Data reduction
  • 3.3.1 Parametric data reduction
  • 3.3.2 Sampling
  • 3.3.3 Dimensionality reduction
  • 3.4 Data transformation
  • 3.4.1 Discretization
  • 3.4.1.1 Supervised discretization
  • 3.4.1.2 Unsupervised discretization
  • 3.5 Data normalization
  • 3.5.1 Min-max normalization
  • 3.5.2 Z-score normalization
  • 3.5.3 Decimal-scaling normalization
  • 3.5.4 Quantile normalization
  • 3.5.5 Logarithmic normalization
  • 3.6 Data integration
  • 3.6.1 Consolidation
  • 3.6.2 Federation
  • 3.7 Summary
  • References
  • 4 Machine learning
  • 4.1 Introduction
  • 4.2 Machine Learning paradigms
  • 4.2.1 Supervised learning
  • 4.2.2 Unsupervised learning
  • 4.2.3 Semisupervised learning
  • 4.3 Inductive bias
  • 4.4 Evaluating a classifier
  • 4.4.1 Evaluation steps
  • 4.4.1.1 Validation
  • 4.4.1.2 Testing
  • 4.4.1.3 K-fold crossvalidation
  • 4.4.2 Handling unbalanced classes
  • 4.4.3 Model generalization
  • 4.4.3.1 Underfitting
  • 4.4.3.2 Overfitting
  • 4.4.3.3 Accurate fittings