O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Text Mining of Web-Based Medical Content

Book Description

•Includes Text Mining and Natural Language Processing Methods for extracting information from electronic health records and biomedical literature.
•Analyzes text analytic tools for new media such as online forums, social media posts, tweets and video sharing.
•Demonstrates how to use speech and audio technologies for improving access to online content for the visually impaired.

Text Mining of Web-Based Medical Content examines various approaches to deriving high quality information from online biomedical literature, electronic health records, query search terms, social media posts and tweets. Using some of the latest empirical methods of knowledge extraction, the authors show how online content, generated by both professionals and laypersons, can be mined for valuable information about disease processes, adverse drug reactions not captured during clinical trials, and tropical fever outbreaks. Additionally, the authors show how to perform infromation extraction on a hospital intranet, how to build a social media search engine to glean information about patients' own experiences interacting with healthcare professionals, and how to improve access to online health information.

This volume provides a wealth of timely material for health informatic professionals and machine learning, data mining, and natural language researchers.

Topics in this book include:
•Mining Biomedical Literature and Clinical Narratives
•Medication Information Extraction
•Machine Learning Techniques for Mining Medical Search Queries
•Detecting the Level of Personal Health Information Revealed in Social Media
•Curating Layperson’s Personal Experiences with Health Care from Social Media and Twitter
•Health Dialogue Systems for Improving Access to Online Content
•Crowd-based Audio Clips to Improve Online Video Access for the Visually Impaired
•Semantic-based Visual Information Retrieval for Mining Radiographic Image Data
•Evaluating the Importance of Medical Terminology in YouTube Video Titles and Descriptions

Table of Contents

  1. Speech Technology and Text Mining in Medicine and Health Care
  2. Title Page
  3. Copyright Page
  4. Preface
  5. Table of Contents
  6. List of authors
  7. Part I Methods and techniques for mining biomedical literature and electronic health records
    1. 1 Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature
      1. 1.1 Introduction
      2. 1.2 Background
        1. 1.2.1 Clinical and biomedical text
        2. 1.2.2 Information retrieval
          1. 1.2.2.1 Information retrieval process
        3. 1.2.3 Information extraction
        4. 1.2.4 Challenges to biomedical information extraction systems
        5. 1.2.5 Applications of biomedical information extraction tools
      3. 1.3 Biomedical knowledge extraction using text mining
        1. 1.3.1 Unstructured text gathering and preprocessing
          1. 1.3.1.1 Text gathering
          2. 1.3.1.2 Text preprocessing
        2. 1.3.2 Extraction of features and semantic information
        3. 1.3.3 Analysis of annotated texts
          1. 1.3.3.1 Algorithms for text classification
          2. 1.3.3.2 Classification evaluation measures
        4. 1.3.4 Presentation
      4. 1.4 Text mining tools
      5. 1.5 Summary
      6. Appendix “A”
      7. References
    2. 2 Unlocking information in electronic health records using natural language processing: a case study in medication information extraction
      1. 2.1 Introduction to clinical natural language processing
      2. 2.2 Medication information in EHRs
      3. 2.3 Medication information extraction systems and methods
        1. 2.3.1 Relevant work
        2. 2.3.2 Summary of approaches
          1. 2.3.2.1 Rule-based methods
          2. 2.3.2.2 Machine learning-based methods
          3. 2.3.2.3 Hybrid methods
      4. 2.4 Uses of medication information extraction tools in clinical research
      5. 2.5 Challenges and future work
      6. References
    3. 3 Online health information semantic search and exploration: reporting on two prototypes for performing information extraction on both a hospital intranet and the world wide web
      1. 3.1 Introduction
      2. 3.2 Background
      3. 3.3 Related work
        1. 3.3.1 Semantic search
        2. 3.3.2 Health information search and exploration
        3. 3.3.3 Information extraction for health
        4. 3.3.4 Ontology-based information extraction – OBIE
      4. 3.4 A general architecture for health search: handling both private and public content
      5. 3.5 Two semantic search systems for health
        1. 3.5.1 MedInX
          1. 3.5.1.1 MedInX ontologies
          2. 3.5.1.2 MedInX system
          3. 3.5.1.3 Representative results
        2. 3.5.2 SPHInX – Semantic search of public health information in portuguese
          1. 3.5.2.1 System architecture
          2. 3.5.2.2 Natural language processing
          3. 3.5.2.3 Semantic extraction models
          4. 3.5.2.4 Semantic extraction and integration
          5. 3.5.2.5 Search and exploration
      6. 3.6 Conclusion
      7. Acknowledgments
      8. References
  8. Part II Machine Learning Techniques for Mining Medical Search Queries and Health-Related Social Media Posts and Tweets
    1. 4 Predicting dengue incidence in Thailand from online search queries that include weather and climatic variables
      1. 4.1 Introduction
        1. 4.1.1 Dengue disease in the world
      2. 4.2 Epidemiology of dengue disease
        1. 4.2.1 Temperature change and the ecology of A. aegypti
      3. 4.3 Using online data to forecast incidence of dengue
        1. 4.3.1 Background and related work
        2. 4.3.2 Methodology for dengue cases prediction
          1. 4.3.2.1 Framework
          2. 4.3.2.2 Data sets
          3. 4.3.2.3 Predictive models
          4. 4.3.2.4 Validation
        3. 4.3.3 Prediction analysis
          1. 4.3.3.1 Multiple linear regression
          2. 4.3.3.2 Artificial neural network
          3. 4.3.3.3 Comparison of predictive models
        4. 4.3.4 Discussion
      4. 4.4 Conclusion
      5. References
    2. 5 A study of personal health information posted online: using machine learning to validate the importance of the terms detected by MedDRA and SNOMED in revealing health informationin social media
      1. 5.1 Introduction
      2. 5.2 Related background
        1. 5.2.1 Personal health information in social networks
        2. 5.2.2 Protection of personal health information
        3. 5.2.3 Previous work
      3. 5.3 Technology
        1. 5.3.1 Data mining
        2. 5.3.2 Machine learning
        3. 5.3.3 Information extraction
        4. 5.3.4 Natural language processing
      4. 5.4 Electronic resources of medical terminology
        1. 5.4.1 MedDRA and its use in text data mining
        2. 5.4.2 SNOMED and its use in text data mining
        3. 5.4.3 Benefits of using MedDRA and SNOMED
      5. 5.5 Empirical study
        1. 5.5.1 MySpace data
        2. 5.5.2 Data annotation
        3. 5.5.3 MedDRA results
        4. 5.5.4 SNOMED results
      6. 5.6 Risk factor of personal information
        1. 5.6.1 Introducing RFPI
        2. 5.6.2 Results from MedDRA and SNOMED
        3. 5.6.3 Challenges in detecting PHI
      7. 5.7 Learning the profile of PHI disclosure
        1. 5.7.1 Part I – Standard bag of words model
        2. 5.7.2 Part II – Special treatment for medical terms
      8. 5.8 Conclusion and future work
      9. Acknowledgment
      10. References
    3. 6 Twitter for health – building a social media search engine to better understand and curate laypersons’ personal experiences
      1. 6.1 Introduction
      2. 6.2 Background
        1. 6.2.1 Social media as a source of health information
        2. 6.2.2 Information search on social media
      3. 6.3 Proposed solutions
        1. 6.3.1 Tools for information retrieval on twitter
          1. 6.3.1.1 Basic recipe for building a search engine
          2. 6.3.1.2 Solutions
          3. 6.3.1.3 Health concerns, availability of clean water and food, and other information for crisis management knowledge from twitter
      4. 6.4 Background
      5. 6.5 Some solutions
      6. 6.6 Tools for combining, comparing, and correlating tweets with other sources of health information
      7. 6.7 Discussion
      8. 6.8 Related solutions
        1. 6.8.1 Maps applications for disease monitoring
        2. 6.8.2 Maps applications in crisis situations
        3. 6.8.3 Extraction systems to monitor relationships between drugs and adverse events
        4. 6.8.4 An early warning systems to discover unrecognized adverse drug events
      9. 6.9 Methods for information curation
      10. 6.10 Future work
      11. Acknowledgments
      12. References
  9. Part III Using speech and audio technologies for improving access to online content for the computer-illiterate and the visually impaired
    1. 7 An empirical study of user satisfaction with a health dialogue system designed for the Nigerian low-literate, computer-illiterate, and visually impaired
      1. 7.1 Introduction
      2. 7.2 Related work
      3. 7.3 Dialogue systems
      4. 7.4 Methods
        1. 7.4.1 Participants
        2. 7.4.2 Demographics of the participants
        3. 7.4.3 Data collection
        4. 7.4.4 Data analysis
      5. 7.5 Health dialogue system (HDS)
      6. 7.6 Results
        1. 7.6.1 Experiences with mobile/computing devices
        2. 7.6.2 User satisfaction and acceptability of HDS
      7. 7.7 Conclusion
      8. Acknowledgment
      9. References
    2. 8 DVX – the descriptive video exchange project: using crowd-based audio clips to improve online video access for the blind and the visually impaired
      1. 8.1 Current problems with video data
      2. 8.2 The description solution
        1. 8.2.1 What is description?
        2. 8.2.2 Description for the visually impaired
          1. 8.2.2.1 Current types
      3. 8.3 Architecture of DVX
        1. 8.3.1 The DVX server
          1. 8.3.1.1 Major data elements, attributes and actions
          2. 8.3.1.2 Current implementation
          3. 8.3.1.3 Tomcat servlet container
          4. 8.3.1.4 Applications
      4. 8.4 DVX solves description problems
      5. 8.5 DVX and video search
      6. 8.6 Conclusion
      7. Acknowledgment
  10. Part IV Visual data: new methods and approaches to mining radiographic image data and video metadata
    1. 9 Information extraction from medical images: evaluating a novel automatic image annotation system using semantic-based visual information retrieval
      1. 9.1 Introduction
      2. 9.2 Background
      3. 9.3 Related work
      4. 9.4 Architecture of system
      5. 9.5 The segmentation algorithm – graph-based object detection (GBOD)
      6. 9.6 Experimental results
      7. 9.7 Conclusions
      8. References
    2. 10 Helping patients in performing online video search: evaluating the importance of medical terminology extracted from MeSH and ICD-10 in health video title and description
      1. 10.1 Introduction
      2. 10.2 Data and methods
        1. 10.2.1 Obtaining video data
        2. 10.2.2 Detecting medical terms in video title and/or description
        3. 10.2.3 Medical vocabularies
      3. 10.3 Results
        1. 10.3.1 ICD-10 results
        2. 10.3.2 MeSH Results
        3. 10.3.3 Terms used in video titles and descriptions
        4. 10.3.4 Occurrences of terms – when discarding the most common terms
      4. 10.4 Discussion
        1. 10.4.1 Findings
        2. 10.4.2 How ICD-10 and MeSH terms can be useful
        3. 10.4.3 Discriminating power of terms
        4. 10.4.4 The uniqueness of our study when compared to other work
      5. 10.5 Conclusion
      6. Acknowledgments
      7. References
  11. Editor’s biography