O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Big Data Analytics with Applications in Insider Threat Detection

Book Description

Today's malware mutates randomly to avoid detection, but reactively adaptive malware is more intelligent, learning and adapting to new computer defenses on the fly. Using the same algorithms that antivirus software uses to detect viruses, reactively adaptive malware deploys those algorithms to outwit antivirus defenses and to go undetected. This book provides details of the tools, the types of malware the tools will detect, implementation of the tools in a cloud computing framework and the applications for insider threat detection.

Table of Contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Contents
  7. Preface
  8. Acknowledgments
  9. Permissions
  10. Authors
  11. Chapter 1: Introduction
    1. 1.1 Overview
    2. 1.2 Supporting Technologies
    3. 1.3 Stream Data Analytics
    4. 1.4 Applications of Stream Data Analytics for Insider Threat Detection
    5. 1.5 Experimental BDMA and BDSP Systems
    6. 1.6 Next Steps in BDMA and BDSP
    7. 1.7 Organization of This Book
    8. 1.8 Next Steps
  12. Part I: Supporting Technologies for BDMA and BDSP
    1. Introduction to Part I
    2. Chapter 2: Data Security and Privacy
      1. 2.1 Overview
      2. 2.2 Security Policies
        1. 2.2.1 Access Control Policies
          1. 2.2.1.1 Authorization-Based Access Control Policies
          2. 2.2.1.2 Role-Based Access Control
          3. 2.2.1.3 Usage Control
          4. 2.2.1.4 Attribute-Based Access Control
        2. 2.2.2 Administration Policies
        3. 2.2.3 Identification and Authentication
        4. 2.2.4 Auditing: A Database System
        5. 2.2.5 Views for Security
      3. 2.3 Policy Enforcement and Related Issues
        1. 2.3.1 SQL Extensions for Security
        2. 2.3.2 Query Modification
        3. 2.3.3 Discretionary Security and Database Functions
      4. 2.4 Data Privacy
      5. 2.5 Summary and Directions
      6. References
    3. Chapter 3: Data Mining Techniques
      1. 3.1 Introduction
      2. 3.2 Overview of Data Mining Tasks and Techniques
      3. 3.3 Artificial Neural Networks
      4. 3.4 Support Vector Machines
      5. 3.5 Markov Model
      6. 3.6 Association Rule Mining (ARM)
      7. 3.7 Multiclass Problem
      8. 3.8 Image Mining
        1. 3.8.1 Overview
        2. 3.8.2 Feature Selection
        3. 3.8.3 Automatic Image Annotation
        4. 3.8.4 Image Classification
      9. 3.9 Summary
      10. References
    4. Chapter 4: Data Mining for Security Applications
      1. 4.1 Overview
      2. 4.2 Data Mining for Cyber Security
        1. 4.2.1 Cyber Security Threats
          1. 4.2.1.1 Cyber Terrorism, Insider Threats, and External Attacks
          2. 4.2.1.2 Malicious Intrusions
          3. 4.2.1.3 Credit Card Fraud and Identity Theft
          4. 4.2.1.4 Attacks on Critical Infrastructures
        2. 4.2.2 Data Mining for Cyber Security
      3. 4.3 Data Mining Tools
      4. 4.4 Summary and Directions
      5. References
    5. Chapter 5: Cloud Computing and Semantic Web Technologies
      1. 5.1 Introduction
      2. 5.2 Cloud Computing
        1. 5.2.1 Overview
        2. 5.2.2 Preliminaries
          1. 5.2.2.1 Cloud Deployment Models
          2. 5.2.2.2 Service Models
        3. 5.2.3 Virtualization
        4. 5.2.4 Cloud Storage and Data Management
        5. 5.2.5 Cloud Computing Tools
          1. 5.2.5.1 Apache Hadoop
          2. 5.2.5.2 MapReduce
          3. 5.2.5.3 CouchDB
          4. 5.2.5.4 HBase
          5. 5.2.5.5 MongoDB
          6. 5.2.5.6 Hive
          7. 5.2.5.7 Apache Cassandra
      3. 5.3 Semantic Web
        1. 5.3.1 XML
        2. 5.3.2 RDF
        3. 5.3.3 SPARQL
        4. 5.3.4 OWL
        5. 5.3.5 Description Logics
        6. 5.3.6 Inferencing
        7. 5.3.7 SWRL
      4. 5.4 Semantic Web and Security
        1. 5.4.1 XML Security
        2. 5.4.2 RDF Security
        3. 5.4.3 Security and Ontologies
        4. 5.4.4 Secure Query and Rules Processing
      5. 5.5 Cloud Computing Frameworks Based on Semantic Web Technologies
        1. 5.5.1 RDF Integration
        2. 5.5.2 Provenance Integration
      6. 5.6 Summary and Directions
      7. References
    6. Chapter 6: Data Mining and Insider Threat Detection
      1. 6.1 Introduction
      2. 6.2 Insider Threat Detection
      3. 6.3 The Challenges, Related Work, and Our Approach
      4. 6.4 Data Mining for Insider Threat Detection
        1. 6.4.1 Our Solution Architecture
        2. 6.4.2 Feature Extraction and Compact Representation
          1. 6.4.2.1 Vector Representation of the Content
          2. 6.4.2.2 Subspace Clustering
        3. 6.4.3 RDF Repository Architecture
        4. 6.4.4 Data Storage
          1. 6.4.4.1 File Organization
        5. 6.4.5 Answering Queries Using Hadoop MapReduce
        6. 6.4.6 Data Mining Applications
      5. 6.5 Comprehensive Framework
      6. 6.6 Summary and Directions
      7. References
    7. Chapter 7: Big Data Management and Analytics Technologies
      1. 7.1 Introduction
      2. 7.2 Infrastructure Tools to Host BDMA Systems
      3. 7.3 BDMA Systems and Tools
        1. 7.3.1 Apache Hive
        2. 7.3.2 Google BigQuery
        3. 7.3.3 NoSQL Database
        4. 7.3.4 Google BigTable
        5. 7.3.5 Apache HBase
        6. 7.3.6 MongoDB
        7. 7.3.7 Apache Cassandra
        8. 7.3.8 Apache CouchDB
        9. 7.3.9 Oracle NoSQL Database
        10. 7.3.10 Weka
        11. 7.3.11 Apache Mahout
      4. 7.4 Cloud Platforms
        1. 7.4.1 Amazon Web Services’ DynamoDB
        2. 7.4.2 Microsoft Azure’s Cosmos DB
        3. 7.4.3 IBM’s Cloud-Based Big Data Solutions
        4. 7.4.4 Google’s Cloud-Based Big Data Solutions
      5. 7.5 Summary and Directions
      6. References
    8. Conclusion to Part I
  13. Part II: Stream Data Analytics
    1. Introduction to Part II
    2. Chapter 8: Challenges for Stream Data Classification
      1. 8.1 Introduction
      2. 8.2 Challenges
      3. 8.3 Infinite Length and Concept Drift
      4. 8.4 Concept Evolution
      5. 8.5 Limited Labeled Data
      6. 8.6 Experiments
      7. 8.7 Our Contributions
      8. 8.8 Summary and Directions
      9. References
    3. Chapter 9: Survey of Stream Data Classification
      1. 9.1 Introduction
      2. 9.2 Approach to Data Stream Classification
      3. 9.3 Single-Model Classification
      4. 9.4 Ensemble Classification and Baseline Approach
      5. 9.5 Novel Class Detection
        1. 9.5.1 Novelty Detection
        2. 9.5.2 Outlier Detection
        3. 9.5.3 Baseline Approach
      6. 9.6 Data Stream Classification with Limited Labeled Data
        1. 9.6.1 Semisupervised Clustering
        2. 9.6.2 Baseline Approach
      7. 9.7 Summary and Directions
      8. References
    4. Chapter 10: A Multi-Partition, Multi-Chunk Ensemble for Classifying Concept-Drifting Data Streams
      1. 10.1 Introduction
      2. 10.2 Ensemble Development
        1. 10.2.1 Multiple Partitions of Multiple Chunks
          1. 10.2.1.1 An Ensemble Built on MPC
          2. 10.2.1.2 MPC Ensemble Updating Algorithm
        2. 10.2.2 Error Reduction Using MPC Training
          1. 10.2.2.1 Time Complexity of MPC
      3. 10.3 Experiments
        1. 10.3.1 Datasets and Experimental Setup
          1. 10.3.1.1 Real (Botnet) Dataset
          2. 10.3.1.2 Baseline Methods
        2. 10.3.2 Performance Study
      4. 10.4 Summary and Directions
      5. References
    5. Chapter 11: Classification and Novel Class Detection in Concept-Drifting Data Streams
      1. 11.1 Introduction
      2. 11.2 ECSMiner
        1. 11.2.1 Overview
        2. 11.2.2 High Level Algorithm
        3. 11.2.3 Nearest Neighborhood Rule
        4. 11.2.4 Novel Class and Its Properties
        5. 11.2.5 Base Learners
        6. 11.2.6 Creating Decision Boundary during Training
      3. 11.3 Classification with Novel Class Detection
        1. 11.3.1 High-Level Algorithm
        2. 11.3.2 Classification
        3. 11.3.3 Novel Class Detection
        4. 11.3.4 Analysis and Discussion
          1. 11.3.4.1 Justification of the Novel Class Detection Algorithm
          2. 11.3.4.2 Deviation between Approximate and Exact q -NSC Computation
          3. 11.3.4.3 Time and Space Complexity
      4. 11.4 Experiments
        1. 11.4.1 Datasets
          1. 11.4.1.1 Synthetic Data with only Concept Drift (SynC)
          2. 11.4.1.2 Synthetic Data with Concept Drift and Novel Class (SynCN)
          3. 11.4.1.3 Real Data—KDDCup 99 Network Intrusion Detection (KDD)
          4. 11.4.1.4 Real Data—Forest Covers Dataset from UCI Repository (Forest)
        2. 11.4.2 Experimental Set-Up
        3. 11.4.3 Baseline Approach
        4. 11.4.4 Performance Study
          1. 11.4.4.1 Evaluation Approach
          2. 11.4.4.2 Results
      5. 11.5 Summary and Directions
      6. References
    6. Chapter 12: Data Stream Classification with Limited Labeled Training Data
      1. 12.1 Introduction
      2. 12.2 Description of ReaSC
      3. 12.3 Training with Limited Labeled Data
        1. 12.3.1 Problem Description
        2. 12.3.2 Unsupervised K -Means Clustering
        3. 12.3.3 K -Means Clustering with Cluster-Impurity Minimization
        4. 12.3.4 Optimizing the Objective Function with Expectation Maximization (E-M)
        5. 12.3.5 Storing the Classification Model
      4. 12.4 Ensemble Classification
        1. 12.4.1 Classification Overview
        2. 12.4.2 Ensemble Refinement
        3. 12.4.3 Ensemble Update
        4. 12.4.4 Time Complexity
      5. 12.5 Experiments
        1. 12.5.1 Dataset
        2. 12.5.2 Experimental Setup
        3. 12.5.3 Comparison with Baseline Methods
        4. 12.5.4 Running Times, Scalability, and Memory Requirement
        5. 12.5.5 Sensitivity to Parameters
      6. 12.6 Summary and Directions
      7. References
    7. Chapter 13: Directions in Data Stream Classification
      1. 13.1 Introduction
      2. 13.2 Discussion of the Approaches
        1. 13.2.1 MPC Ensemble Approach
        2. 13.2.2 Classification and Novel Class Detection in Data Streams (ECSMiner)
        3. 13.2.3 Classification with Scarcely Labeled Data (ReaSC)
      3. 13.3 Extensions
      4. 13.4 Summary and Directions
      5. References
    8. Conclusion to Part II
  14. Part III: Stream Data Analytics for Insider Threat Detection
    1. Introduction to Part III
    2. Chapter 14: Insider Threat Detection as a Stream Mining Problem
      1. 14.1 Introduction
      2. 14.2 Sequence Stream Data
      3. 14.3 Big Data Issues
      4. 14.4 Contributions
      5. 14.5 Summary and Directions
      6. References
    3. Chapter 15: Survey of Insider Threat and Stream Mining
      1. 15.1 Introduction
      2. 15.2 Insider Threat Detection
      3. 15.3 Stream Mining
      4. 15.4 Big Data Techniques for Scalability
      5. 15.5 Summary and Directions
      6. References
    4. Chapter 16 Ensemble-Based Insider Threat Detection
      1. 16.1 Introduction
      2. 16.2 Ensemble Learning
      3. 16.3 Ensemble for Unsupervised Learning
      4. 16.4 Ensemble for Supervised Learning
      5. 16.5 Summary and Directions
      6. References
    5. Chapter 17: Details of Learning Classes
      1. 17.1 Introduction
      2. 17.2 Supervised Learning
      3. 17.3 Unsupervised Learning
        1. 17.3.1 GBAD-MDL
        2. 17.3.2 GBAD-P
        3. 17.3.3 GBAD-MPS
      4. 17.4 Summary and Directions
      5. References
    6. Chapter 18: Experiments and Results for Nonsequence Data
      1. 18.1 Introduction
      2. 18.2 Dataset
      3. 18.3 Experimental Setup
        1. 18.3.1 Supervised Learning
        2. 18.3.2 Unsupervised Learning
      4. 18.4 Results
        1. 18.4.1 Supervised Learning
        2. 18.4.2 Unsupervised Learning
      5. 18.5 Summary and Directions
      6. References
    7. Chapter 19: Insider Threat Detection for Sequence Data
      1. 19.1 Introduction
      2. 19.2 Classifying Sequence Data
      3. 19.3 Unsupervised Stream-Based Sequence Learning (USSL)
        1. 19.3.1 Construct the LZW Dictionary by Selecting the Patterns in the Data Stream
        2. 19.3.2 Constructing the Quantized Dictionary
      4. 19.4 Anomaly Detection
      5. 19.5 Complexity Analysis
      6. 19.6 Summary and Directions
      7. References
    8. Chapter 20: Experiments and Results for Sequence Data
      1. 20.1 Introduction
      2. 20.2 Dataset
      3. 20.3 Concept Drift in the Training Set
      4. 20.4 Results
        1. 20.4.1 Choice of Ensemble Size
      5. 20.5 Summary and Directions
      6. References
    9. Chapter 21: Scalability Using Big Data Technologies
      1. 21.1 Introduction
      2. 21.2 Hadoop Mapreduce Platform
      3. 21.3 Scalable LZW and QD Construction Using Mapreduce Job
        1. 21.3.1 2MRJ Approach
        2. 21.3.2 1MRJ Approach
      4. 21.4 Experimental Setup and Results
        1. 21.4.1 Hadoop Cluster
        2. 21.4.2 Big Dataset for Insider Threat Detection
        3. 21.4.3 Results for Big Data Set Related to Insider Threat Detection
          1. 21.4.3.1 On OD Dataset
          2. 21.4.3.2 On DBD Dataset
      5. 21.5 Summary and Directions
      6. References
    10. Chapter 22: Stream Mining and Big Data for Insider Threat Detection
      1. 22.1 Introduction
      2. 22.2 Discussion
      3. 22.3 Future Work
        1. 22.3.1 Incorporate User Feedback
        2. 22.3.2 Collusion Attack
        3. 22.3.3 Additional Experiments
        4. 22.3.4 Anomaly Detection in Social Network and Author Attribution
        5. 22.3.5 Stream Mining as a Big Data Mining Problem
      4. 22.4 Summary and Directions
      5. References
    11. Conclusion to Part III
  15. Part IV: Experimental BDMA and BDSP Systems
    1. Introduction to Part IV
    2. Chapter 23: Cloud Query Processing System for Big Data Management
      1. 23.1 Introduction
      2. 23.2 Our Approach
      3. 23.3 Related Work
      4. 23.4 Architecture
      5. 23.5 Mapreduce Framework
        1. 23.5.1 Overview
        2. 23.5.2 Input Files Selection
        3. 23.5.3 Cost Estimation for Query Processing
        4. 23.5.4 Query Plan Generation
        5. 23.5.5 Breaking Ties by Summary Statistics
        6. 23.5.6 MapReduce Join Execution
      6. 23.6 Results
        1. 23.6.1 Experimental Setup
        2. 23.6.2 Evaluation
      7. 23.7 Security Extensions
        1. 23.7.1 Access Control Model
        2. 23.7.2 Access Token Assignment
        3. 23.7.3 Conflicts
      8. 23.8 Summary and Directions
      9. References
    3. Chapter 24: Big Data Analytics for Multipurpose Social Media Applications
      1. 24.1 Introduction
      2. 24.2 Our Premise
      3. 24.3 Modules of Inxite
        1. 24.3.1 Overview
        2. 24.3.2 Information Engine
          1. 24.3.2.1 Entity Extraction
          2. 24.3.2.2 Information Integration
        3. 24.3.3 Person of Interest Analysis
          1. 24.3.3.1 InXite Person of Interest Profile Generation and Analysis
          2. 24.3.3.2 InXite POI Threat Analysis
          3. 24.3.3.3 InXite Psychosocial Analysis
          4. 24.3.3.4 Other features
        4. 24.3.4 InXite Threat Detection and Prediction
        5. 24.3.5 Application of SNOD
          1. 24.3.5.1 SNOD++
          2. 24.3.5.2 Benefits of SNOD++
        6. 24.3.6 Expert Systems Support
        7. 24.3.7 Cloud-Design of Inxite to Handle Big Data
        8. 24.3.8 Implementation
      4. 24.4 Other Applications
      5. 24.5 Related Work
      6. 24.6 Summary and Directions
      7. References
    4. Chapter 25: Big Data Management and Cloud for Assured Information Sharing
      1. 25.1 Introduction
      2. 25.2 Design Philosophy
      3. 25.3 System Design
        1. 25.3.1 Design of CAISS
        2. 25.3.2 Design of CAISS++
          1. 25.3.2.1 Limitations of CAISS
        3. 25.3.3 Formal Policy Analysis
        4. 25.3.4 Implementation Approach
      4. 25.4 Related Work
        1. 25.4.1 Our Related Research
        2. 25.4.2 Overall Related Research
        3. 25.4.3 Commercial Developments
      5. 25.5 Extensions for Big Data-Based Social Media Applications
      6. 25.6 Summary and Directions
      7. References
    5. Chapter 26: Big Data Management for Secure Information Integration
      1. 26.1 Introduction
      2. 26.2 Integrating Blackbook with Amazon s3
      3. 26.3 Experiments
      4. 26.4 Summary and Directions
      5. References
    6. Chapter 27: Big Data Analytics for Malware Detection
      1. 27.1 Introduction
      2. 27.2 Malware Detection
        1. 27.2.1 Malware Detection as a Data Stream Classification Problem
        2. 27.2.2 Cloud Computing for Malware Detection
        3. 27.2.3 Our Contributions
      3. 27.3 Related Work
      4. 27.4 Design and Implementation of the System
        1. 27.4.1 Ensemble Construction and Updating
        2. 27.4.2 Error Reduction Analysis
        3. 27.4.3 Empirical Error Reduction and Time Complexity
        4. 27.4.4 Hadoop/MapReduce Framework
      5. 27.5 Malicious Code Detection
        1. 27.5.1 Overview
        2. 27.5.2 Nondistributed Feature Extraction and Selection
        3. 27.5.3 Distributed Feature Extraction and Selection
      6. 27.6 Experiments
        1. 27.6.1 Datasets
        2. 27.6.2 Baseline Methods
      7. 27.7 Discussion
      8. 27.8 Summary and Directions
      9. References
    7. Chapter 28: A Semantic Web-Based Inference Controller for Provenance Big Data
      1. 28.1 Introduction
      2. 28.2 Architecture for the Inference Controller
      3. 28.3 Semantic Web Technologies and Provenance
        1. 28.3.1 Semantic Web-Based Models
        2. 28.3.2 Graphical Models and Rewriting
      4. 28.4 Inference Control through Query Modification
        1. 28.4.1 Our Approach
        2. 28.4.2 Domains and Provenance
        3. 28.4.3 Inference Controller with Two Users
        4. 28.4.4 SPARQL Query Modification
      5. 28.5 Implementing the Inference Controller
        1. 28.5.1 Our Approach
        2. 28.5.2 Implementation of a Medical Domain
        3. 28.5.3 Generating and Populating the Knowledge Base
        4. 28.5.4 Background Generator Module
      6. 28.6 Big Data Management and Inference Control
      7. 28.7 Summary and Directions
      8. References
    8. Conclusion to Part IV
  16. Part V: Next Steps for BDMA and BDSP
    1. Introduction to Part V
    2. Chapter 29: Confidentiality, Privacy, and Trust for Big Data Systems
      1. 29.1 Introduction
      2. 29.2 Trust, Privacy, and Confidentiality
        1. 29.2.1 Current Successes and Potential Failures
        2. 29.2.2 Motivation for a Framework
      3. 29.3 CPT Framework
        1. 29.3.1 The Role of the Server
        2. 29.3.2 CPT Process
        3. 29.3.3 Advanced CPT
        4. 29.3.4 Trust, Privacy, and Confidentiality Inference Engines
      4. 29.4 Our Approach to Confidentiality Management
      5. 29.5 Privacy for Social Media Systems
      6. 29.6 Trust for Social Networks
      7. 29.7 Integrated System
      8. 29.8 CPT within the Context of Big Data and Social Networks
      9. 29.9 Summary and Directions
      10. References
    3. Chapter 30: Unified Framework for Secure Big Data Management and Analytics
      1. 30.1 Overview
      2. 30.2 Integrity Management and Data Provenance for Big Data Systems
        1. 30.2.1 Need for Integrity
        2. 30.2.2 Aspects of Integrity
        3. 30.2.3 Inferencing, Data Quality, and Data Provenance
        4. 30.2.4 Integrity Management, Cloud Services and Big Data
        5. 30.2.5 Integrity for Big Data
      3. 30.3 Design of Our Framework
      4. 30.4 The Global Big Data Security and Privacy Controller
      5. 30.5 Summary and Directions
      6. References
    4. Chapter 31: Big Data, Security, and the Internet of Things
      1. 31.1 Introduction
      2. 31.2 Use Cases
      3. 31.3 Layered Framework for Secure IoT
      4. 31.4 Protecting the Data
      5. 31.5 Scalable Analytics for IoT Security Applications
      6. 31.6 Summary and Directions
      7. References
    5. Chapter 32: Big Data Analytics for Malware Detection in Smartphones
      1. 32.1 Introduction
      2. 32.2 Our Approach
        1. 32.2.1 Challenges
        2. 32.2.2 Behavioral Feature Extraction and Analysis
          1. 32.2.2.1 Graph-Based Behavior Analysis
          2. 32.2.2.2 Sequence-Based Behavior Analysis
          3. 32.2.2.3 Evolving Data Stream Classification
        3. 32.2.3 Reverse Engineering Methods
        4. 32.2.4 Risk-Based Framework
        5. 32.2.5 Application to Smartphones
          1. 32.2.5.1 Data Gathering
          2. 32.2.5.2 Malware Detection
          3. 32.2.5.3 Data Reverse Engineering of Smartphone Applications
      3. 32.3 Our Experimental Activities
        1. 32.3.1 Covert Channel Attack in Mobile Apps
        2. 32.3.2 Detecting Location Spoofing in Mobile Apps
        3. 32.3.3 Large Scale, Automated Detection of SSL/TLS Man-in-the-Middle Vulnerabilities in Android Apps
      4. 32.4 Infrastructure Development
        1. 32.4.1 Virtual Laboratory Development
          1. 32.4.1.1 Laboratory Setup
          2. 32.4.1.2 Programming Projects to Support the Virtual Lab
          3. 32.4.1.3 An Intelligent Fuzzier for the Automatic Android GUI Application Testing
          4. 32.4.1.4 Problem Statement
          5. 32.4.1.5 Understanding the Interface
          6. 32.4.1.6 Generating Input Events
          7. 32.4.1.7 Mitigating Data Leakage in Mobile Apps Using a Transactional Approach
          8. 32.4.1.8 Technical Challenges
          9. 32.4.1.9 Experimental System
          10. 32.4.1.10 Policy Engine
        2. 32.4.2 Curriculum Development
          1. 32.4.2.1 Extensions to Existing Courses
          2. 32.4.2.2 New Capstone Course on Secure Mobile Computing
      5. 32.5 Summary and Directions
      6. References
    6. Chapter 33: Toward a Case Study in Healthcare for Big Data Analytics and Security
      1. 33.1 Introduction
      2. 33.2 Motivation
        1. 33.2.1 The Problem
        2. 33.2.2 Air Quality Data
        3. 33.2.3 Need for Such a Case Study
      3. 33.3 Methodologies
      4. 33.4 The Framework Design
        1. 33.4.1 Storing and Retrieving Multiple Types of Scientific Data
          1. 33.4.1.1 The Problem and Challenges
          2. 33.4.1.2 Current Systems and Their Limitations
          3. 33.4.1.3 The Future System
        2. 33.4.2 Privacy and Security Aware Data Management for Scientific Data
          1. 33.4.2.1 The Problem and Challenges
          2. 33.4.2.2 Current Systems and Their Limitations
          3. 33.4.2.3 The Future System
        3. 33.4.3 Offline Scalable Statistical Analytics
          1. 33.4.3.1 The Problem and Challenges
          2. 33.4.3.2 Current Systems and Their Limitations
          3. 33.4.3.3 The Future System
          4. 33.4.3.4 Mixed Continuous and Discrete Domains
        4. 33.4.4 Real-Time Stream Analytics
          1. 33.4.4.1 The Problem and Challenges
        5. 33.4.5 Current Systems and Their Limitations
          1. 33.4.5.1 The Future System
      5. 33.5 Summary and Directions
      6. References
    7. Chapter 34: Toward an Experimental Infrastructure and Education Program for BDMA and BDSP
      1. 34.1 Introduction
      2. 34.2 Current Research and Infrastructure Activities in BDMA and BDSP
        1. 34.2.1 Big Data Analytics for Insider Threat Detection
        2. 34.2.2 Secure Data Provenance
        3. 34.2.3 Secure Cloud Computing
        4. 34.2.4 Binary Code Analysis
        5. 34.2.5 Cyber-Physical Systems Security
        6. 34.2.6 Trusted Execution Environment
        7. 34.2.7 Infrastructure Development
      3. 34.3 Education and Infrastructure Program in BDMA
        1. 34.3.1 Curriculum Development
        2. 34.3.2 Experimental Program
          1. 34.3.2.1 Geospatial Data Processing on GDELT
          2. 34.3.2.2 Coding for Political Event Data
          3. 34.3.2.3 Timely Health Indicator
      4. 34.4 Security and Privacy for Big Data
        1. 34.4.1 Our Approach
        2. 34.4.2 Curriculum Development
          1. 34.4.2.1 Extensions to Existing Courses
          2. 34.4.2.2 New Capstone Course on BDSP
        3. 34.4.3 Experimental Program
          1. 34.4.3.1 Laboratory Setup
          2. 34.4.3.2 Programming Projects to Support the Lab
      5. 34.5 Summary and Directions
      6. References
    8. Chapter 35: Directions for BDSP and BDMA
      1. 35.1 Introduction
      2. 35.2 Issues in BDSP
        1. 35.2.1 Introduction
        2. 35.2.2 Big Data Management and Analytics
        3. 35.2.3 Security and Privacy
        4. 35.2.4 Big Data Analytics for Security Applications
        5. 35.2.5 Community Building
      3. 35.3 Summary of Workshop Presentations
        1. 35.3.1 Keynote Presentations
          1. 35.3.1.1 Toward Privacy Aware Big Data Analytics
          2. 35.3.1.2 Formal Methods for Preserving Privacy While Loading Big Data
          3. 35.3.1.3 Authenticity of Digital Images in Social Media
          4. 35.3.1.4 Business Intelligence Meets Big Data: An Overview of Security and Privacy
          5. 35.3.1.5 Toward Risk-Aware Policy-Based Framework for BDSP
          6. 35.3.1.6 Big Data Analytics: Privacy Protection Using Semantic Web Technologies
          7. 35.3.1.7 Securing Big Data in the Cloud: Toward a More Focused and Data-Driven Approach
          8. 35.3.1.8 Privacy in a World of Mobile Devices
          9. 35.3.1.9 Access Control and Privacy Policy Challenges in Big Data
          10. 35.3.1.10 Timely Health Indicators Using Remote Sensing and Innovation for the Validity of the Environment
          11. 35.3.1.11 Additional Presentations
          12. 35.3.1.12 Final Thoughts on the Presentations
      4. 35.4 Summary of the Workshop Discussions
        1. 35.4.1 Introduction
        2. 35.4.2 Philosophy for BDSP
        3. 35.4.3 Examples of Privacy-Enhancing Techniques
        4. 35.4.4 Multiobjective Optimization Framework for Data Privacy
        5. 35.4.5 Research Challenges and Multidisciplinary Approaches
        6. 35.4.6 BDMA for Cyber Security
      5. 35.5 Summary and Directions
      6. References
    9. Conclusion to Part V
  17. Chapter 36: Summary and Directions
    1. 36.1 About This Chapter
    2. 36.2 Summary of This Book
    3. 36.3 Directions for BDMA and BDSP
    4. 36.4 Where Do We Go from Here?
  18. Appendix A: Data Management Systems: Developments and Trends
  19. Appendix B: Database Management Systems
  20. Index