book

MySQL Cookbook, 3rd Edition

by Paul DuBois

August 2014

Intermediate to advanced

866 pages

24h 4m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Who This Book Is ForWhat’s in This BookMySQL APIs Used in This BookVersion and Platform NotesConventions Used in This BookThe MySQL Cookbook Companion WebsiteObtaining MySQL and Related SoftwareUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
IntroductionSetting Up a MySQL User AccountCreating a Database and a Sample TableWhat to Do if mysql Cannot Be FoundSpecifying mysql Command OptionsExecuting SQL Statements InteractivelyExecuting SQL Statements Read from a File or ProgramControlling mysql Output Destination and FormatUsing User-Defined Variables in SQL Statements
IntroductionConnecting, Selecting a Database, and DisconnectingChecking for ErrorsWriting Library FilesExecuting Statements and Retrieving ResultsHandling Special Characters and NULL Values in StatementsHandling Special Characters in IdentifiersIdentifying NULL Values in Result SetsTechniques for Obtaining Connection ParametersConclusion and Words of Advice
IntroductionSpecifying Which Columns and Rows to SelectNaming Query Result ColumnsSorting Query ResultsRemoving Duplicate RowsWorking with NULL ValuesWriting Comparisons Involving NULL in ProgramsUsing Views to Simplify Table AccessSelecting Data from Multiple TablesSelecting Rows from the Beginning, End, or Middle of Query ResultsWhat to Do When LIMIT Requires the “Wrong” Sort OrderCalculating LIMIT Values from Expressions
IntroductionCloning a TableSaving a Query Result in a TableCreating Temporary TablesGenerating Unique Table NamesChecking or Changing a Table Storage EngineCopying a Table Using mysqldump
IntroductionString PropertiesChoosing a String Data TypeSetting the Client Connection Character SetWriting String LiteralsChecking or Changing a String’s Character Set or CollationConverting the Lettercase of a StringControlling Case Sensitivity in String ComparisonsPattern Matching with SQL PatternsPattern Matching with Regular ExpressionsBreaking Apart or Combining StringsSearching for SubstringsUsing Full-Text SearchesUsing a Full-Text Search with Short WordsRequiring or Prohibiting Full-Text Search WordsPerforming Full-Text Phrase Searches
IntroductionChoosing a Temporal Data TypeUsing Fractional Seconds SupportChanging MySQL’s Date FormatSetting the Client Time ZoneShifting Temporal Values Between Time ZonesDetermining the Current Date or TimeUsing TIMESTAMP or DATETIME to Track Row-Modification TimesExtracting Parts of Dates or TimesSynthesizing Dates or Times from Component ValuesConverting Between Temporal Values and Basic UnitsCalculating Intervals Between Dates or TimesAdding Date or Time ValuesCalculating AgesFinding the First Day, Last Day, or Length of a MonthCalculating Dates by Substring ReplacementFinding the Day of the Week for a DateFinding Dates for Any Weekday of a Given WeekPerforming Leap-Year CalculationsCanonizing Not-Quite-ISO Date StringsSelecting Rows Based on Temporal Characteristics
IntroductionUsing ORDER BY to Sort Query ResultsUsing Expressions for SortingDisplaying One Set of Values While Sorting by AnotherControlling Case Sensitivity of String SortsDate-Based SortingSorting by Substrings of Column ValuesSorting by Fixed-Length SubstringsSorting by Variable-Length SubstringsSorting Hostnames in Domain OrderSorting Dotted-Quad IP Values in Numeric OrderFloating Values to the Head or Tail of the Sort OrderDefining a Custom Sort OrderSorting ENUM Values
IntroductionBasic Summary TechniquesCreating a View to Simplify Using a SummaryFinding Values Associated with Minimum and Maximum ValuesControlling String Case Sensitivity for MIN() and MAX()Dividing a Summary into SubgroupsSummaries and NULL ValuesSelecting Only Groups with Certain CharacteristicsUsing Counts to Determine Whether Values Are UniqueGrouping by Expression ResultsSummarizing Noncategorical DataFinding Smallest or Largest Summary ValuesDate-Based SummariesWorking with Per-Group and Overall Summary Values SimultaneouslyGenerating a Report That Includes a Summary and a List
IntroductionCreating Compound-Statement ObjectsUsing Stored Functions to Encapsulate CalculationsUsing Stored Procedures to “Return” Multiple ValuesUsing Triggers to Implement Dynamic Default Column ValuesUsing Triggers to Simulate Function-Based IndexesSimulating TIMESTAMP Properties for Other Date and Time TypesUsing Triggers to Log Changes to a TableUsing Events to Schedule Database ActionsWriting Helper Routines for Executing Dynamic SQLHandling Errors Within Stored ProgramsUsing Triggers to Preprocess or Reject Data

IntroductionDetermining the Number of Rows Affected by a StatementObtaining Result Set MetadataDetermining Whether a Statement Produced a Result SetUsing Metadata to Format Query OutputListing or Checking Existence of Databases or TablesAccessing Table Column DefinitionsGetting ENUM and SET Column InformationGetting Server MetadataWriting Applications That Adapt to the MySQL Server Version
IntroductionImporting Data with LOAD DATA and mysqlimportImporting CSV FilesExporting Query Results from MySQLImporting and Exporting NULL ValuesWriting Your Own Data Export ProgramsConverting Datafiles from One Format to AnotherExtracting and Rearranging Datafile ColumnsExchanging Data Between MySQL and Microsoft ExcelExporting Query Results as XMLImporting XML into MySQLGuessing Table Structure from a Datafile
IntroductionUsing the SQL Mode to Reject Bad Input ValuesValidating and Transforming DataUsing Pattern Matching to Validate DataUsing Patterns to Match Broad Content TypesUsing Patterns to Match Numeric ValuesUsing Patterns to Match Dates or TimesUsing Patterns to Match Email Addresses or URLsUsing Table Metadata to Validate DataUsing a Lookup Table to Validate DataConverting Two-Digit Year Values to Four-Digit FormPerforming Validity Checking on Date or Time SubpartsWriting Date-Processing UtilitiesImporting Non-ISO Date ValuesExporting Dates Using Non-ISO FormatsEpilogue
IntroductionCreating a Sequence Column and Generating Sequence ValuesChoosing the Definition for a Sequence ColumnThe Effect of Row Deletions on Sequence GenerationRetrieving Sequence ValuesRenumbering an Existing SequenceExtending the Range of a Sequence ColumnReusing Values at the Top of a SequenceEnsuring That Rows Are Renumbered in a Particular OrderSequencing an Unsequenced TableManaging Multiple Auto-Increment Values SimultaneouslyUsing Auto-Increment Values to Associate TablesUsing Sequence Generators as CountersGenerating Repeating Sequences
IntroductionFinding Matches Between TablesFinding Mismatches Between TablesIdentifying and Removing Mismatched or Unattached RowsComparing a Table to ItselfProducing Master-Detail Lists and SummariesEnumerating a Many-to-Many RelationshipFinding Per-Group Minimum or Maximum ValuesUsing a Join to Fill or Identify Holes in a ListUsing a Join to Control Query Sort OrderReferring to Join Output Column Names in Programs
IntroductionCalculating Descriptive StatisticsPer-Group Descriptive StatisticsGenerating Frequency DistributionsCounting Missing ValuesCalculating Linear Regressions or Correlation CoefficientsGenerating Random NumbersRandomizing a Set of RowsSelecting Random Items from a Set of RowsCalculating Successive-Row DifferencesFinding Cumulative Sums and Running AveragesAssigning RanksComputing Team Standings
IntroductionPreventing Duplicates from Occurring in a TableDealing with Duplicates When Loading Rows into a TableCounting and Identifying DuplicatesEliminating Duplicates from a Table
IntroductionChoosing a Transactional Storage EnginePerforming Transactions Using SQLPerforming Transactions from Within ProgramsUsing Transactions in Perl ProgramsUsing Transactions in Ruby ProgramsUsing Transactions in PHP ProgramsUsing Transactions in Python ProgramsUsing Transactions in Java Programs
IntroductionBasic Principles of Web Page GenerationUsing Apache to Run Web ScriptsUsing Tomcat to Run Web ScriptsEncoding Special Characters in Web Output
IntroductionDisplaying Query Results as ParagraphsDisplaying Query Results as ListsDisplaying Query Results as TablesDisplaying Query Results as HyperlinksCreating Navigation Indexes from Database ContentStoring Images or Other Binary DataServing Images or Other Binary DataServing Banner AdsServing Query Results for Download
IntroductionWriting Scripts That Generate Web FormsCreating Single-Pick Form Elements from Database ContentCreating Multiple-Pick Form Elements from Database ContentLoading Database Content into a FormCollecting Web InputValidating Web InputStoring Web Input in a DatabaseProcessing File UploadsPerforming Web-Based Database SearchesGenerating Previous-Page and Next-Page LinksGenerating “Click to Sort” Table HeadingsWeb Page Access CountingWeb Page Access LoggingUsing MySQL for Apache Logging
IntroductionUsing MySQL-Based Sessions in Perl ApplicationsUsing MySQL-Based Storage in Ruby ApplicationsUsing MySQL-Based Storage with the PHP Session ManagerUsing MySQL for Session-Backing Store with Tomcat
IntroductionConfiguring the ServerManaging the Plug-In InterfaceControlling Server LoggingRotating or Expiring LogfilesRotating Log Tables or Expiring Log Table RowsMonitoring the MySQL ServerCreating and Using Backups
IntroductionUnderstanding the mysql.user TableManaging User AccountsImplementing a Password PolicyChecking Password StrengthExpiring PasswordsAssigning Yourself a New PasswordResetting an Expired PasswordFinding and Fixing Insecure AccountsDisabling Use of Accounts with Pre-4.1 PasswordsFinding and Removing Anonymous AccountsModifying “Any Host” and “Many Host” Accounts

Content preview from MySQL Cookbook, 3rd Edition

Chapter 16. Handling Duplicates

Introduction

Tables or result sets sometimes contain duplicate rows. In some cases this is acceptable. For example, if you conduct a web poll that records date and client IP number along with the votes, duplicate rows may be permitted because it’s possible for large numbers of votes to appear to originate from the same IP number for an Internet service that routes traffic from its customers through a single proxy host. In other cases, duplicates are unacceptable, and you’ll want to take steps to avoid them. Operations involved in handling duplicate rows include the following:

Preventing duplicates from being created in the first place. If each row in a table is intended to represent a single entity (such as a person, an item in a catalog, or a specific observation in an experiment), the occurrence of duplicates presents significant difficulties in using it that way. Duplicates make it impossible to refer to each row unambiguously, so it’s best to make sure duplicates never occur.
Counting the number of duplicates to determine whether they are present and to what extent.
Identifying duplicated values (or the rows containing them) so you can see where they occur.
Eliminating duplicates to ensure that each row is unique. This may involve removing rows from a table to leave only unique rows or selecting a result set in such a way that no duplicates appear in the output. For example, to display a list of the states in which you have customers, you probably ...