Chapter 4. Getting Data out of Alteryx Designer
Just as there are so many different formats that you can use to get data into Designer, there are also just as many formats and methods to get data out of Designer. You could output your data to simple sources, like a CSV file or an Excel file, but you can also output directly to databases, web sources, and major data visualization tools like PowerBI and Tableau. Most users output data either to files, like Excel and CSV, or to databases, like Microsoft SQL Server. In this chapter, you’ll learn how to do both.
Like I’ve said in earlier chapters, Designer is extremely flexible and powerful, and can build well-designed reports and outputs in various formats that allow you to automate both reporting and analysis. This chapter will focus on the Output Data tool, but I will introduce many more tools for outputting data, which we will cover in detail in Chapters 9 and 10.
Files Versus Databases
Many of us in the analytics industry never took a formal class or received training on using databases. In fact, most of us probably received more training on using Excel for our projects than anything else. As we discuss outputting data, I want to use this opportunity to challenge you to consider if using databases might actually be better for you. There are two reasons I am posing this challenge.
First, Designer makes it so easy to interact with databases—both getting data in and getting data out. Second, there are so many features that databases provide. You may think that working with files is easier and quicker, but I would like you to challenge that assumption. With a little work, you can work more effectively and efficiently using databases.
More specifically, I see three key benefits of using databases over files:
- Central location
- Anyone who has used files to store data can attest to the mess that occurs over time in one’s file storage. Folders and folders of data are just spread out and, many times, are hard to get to unless you happen to be extremely organized at all times. Having a database in which to store data makes more sense in that it’s all in one central location. Databases don’t care if the data is one row or one billion rows; they can handle any size.
- Scale and flexibility
- You can’t beat the power of a database. Users who have tried to put 500,000 rows of data into an Excel sheet know that most file types were not built to scale well. That flexibility of being able to go from one record to one billion records is powerful. The flexibility of a database is also demonstrated in its ability to store all different data types in one place. Having a database that can store JSON formats, spatial formats, and standard table formats means users do not have to manage multiple file types based on the data.
- Access and control
- Having files that contain valuable data spread out all over a machine is a risk for companies and the user. If a user leaves their laptop logged in, by chance, and someone takes the opportunity to rummage through files, there is no way to stop them from grabbing a file sitting on the desktop and emailing it to themselves or someone else. It’s there, open and exposed. With a database, however, you can secure your data—it’s stored on servers that are within the walls of a company and can be properly looked after.
For as long as I can remember, databases have represented “real” analytics, at scale. I assure you that files are just as important for analytics, and we’ll discuss next.
Output to Files
The main way to output data to a file is by using the Output Data tool, which is the main topic of this chapter. This tool has many features that go above and beyond simply outputting data. Let’s start off by taking a look at how we would output to a single CSV file. Then, we will look at outputting to Excel as well as Alteryx’s proprietary database format YXDB.
Note
Alteryx has a few file types that are proprietary and only used with Alteryx Designer and Server. One of them is a database file format called YXDB. This file format is the most efficient file type for reading and writing in Designer because it has no size limit, is compressed for maximum speed, and includes additional metadata that references the source of the data and how the data was created. There is no row limit—no other file format, other than text, supports as many rows.
Output to a CSV File
To output to a CSV file using the Output Data tool, first drag and drop an Output Data tool to the canvas and ensure it’s connected to the tool that you want to output data from (see Figure 4-1).
To configure the Output Data tool, you will need to tell Designer where you want to save the output file. You can do this just like you would for the Input Data tool, by clicking on the down arrow on the right side of the Configuration window, which will open up the Data connections window (see Figure 4-2).
In the Data connections window, select the CSV option. Then, in the next window, choose a name for the file (see Figure 4-3).
Click Save. You have completed the configuration for outputting the data to a CSV file on your machine! Remember, you can also change the configuration, if needed (see Figure 4-4).
Let’s dig into some of the configuration options for CSV files.
Configuration options
In general, it’s important to understand the configuration options you have available to you for the Output Data tool, as there are many options that can determine whether you are able to properly read the file. There are also nice little tricks that will take 2–3 steps out of your workflow just by getting the right options set. Let’s take a look at those options:
- Max Records Per File
- This option allows you to specify how many records get written out. If you are in development mode and just testing, you could set this to, say, 10 records so that it’s fast until you need to run it fully.
- File Format
- This is the option that you use to choose your file type.
- Delimiters
- This is very important to pay attention to, as it is the actual character that separates your data into columns. If you try to output to a file and all the data is ending up in one column, then it’s likely your delimiter is incorrect.
- First Row Contains Field Names
- This option allows you to specify whether your data contains headers or not.
- Quote Output Fields
- This option is similar to the Input Data tool option, which ensures that values don’t get haphazardly cut off by quotes.
- Code Page
- This is the option that allows you to specify the code page. Most of the time, you’ll never need to modify this.
- Line Ending Style
- If you are working with files from different systems (Unix, Mac, or Windows), then you might need to configure this option in order to properly write the files.
- Write BOM
- This option allows you to include or exclude the byte order mark (BOM) in the output.
Once you have run your workflow with the configured Output Data tool, it’s very easy to open the file from Alteryx. Let’s talk about how to view the output file.
Viewing the output file
In the Results window, you will notice that any input or output that is listed is also set as a hyperlink that you can click on to open that file. You can also filter your logs quickly by clicking on the headers of your Results pane to see only the Warnings or Messages, for example (see Figure 4-5).
Output to an Excel File
Outputting to an Excel file is very similar to outputting to a CSV file, with just a couple of different options. You will still need to choose the file you want to write to, but instead of clicking the down arrow and selecting a file, you can just type (or copy and paste) the filename and location into the text box. If typed correctly, with the file extension .xlsx, then it will automatically recognize and set the default configuration. It’s important to note that if you go through the process of clicking on the down arrow and using the interface to select the file, some of the options (like Max Records Per File and File Format), as well as the sheet name selection, will be automatically set. If you type the file path and name, you will need to configure manually as well as use the “|||” (pipe delimiters) at the end with the sheet name to specify what exact sheet you want to use (see Figure 4-6).
An important item to keep in mind is that the sheets of an Excel file are best thought of as tables of a database. They operate in much the same way. Let’s talk more about how we can configure the output.
Configuration options
The most important option for outputting to Excel is the third option in Figure 4-6 (Output Options), which allows you to choose the criteria for output of the file. You need to ensure that you have set this correctly, or it could cause errors the second time you run the workflow. When outputting to Excel, you have four configuration options to choose from:
- Append to Existing Sheet
- Use this option if you have a file/sheet already created that you want to add to.
- Overwrite Sheet or Range
- Use this option if you want to replace a sheet or range.
- Overwrite File (Remove)
- Use this option if you want to replace an entire workbook.
- Create New Sheet
- Use this option if each time you run, you want a new sheet created. Caution: if that sheet is already created, Designer will throw an error telling you it can’t be written to because it already exists.
Output to a YXDB File
Outputting to YXDB is probably the easiest way to output. For many use cases, for example where a file is required and the data set is also quite large or where owners of a data set don’t want that data modified, it’s a really good option if you want to quickly put the data in an easy format and don’t have requirements to report or present the data.
To output your data into the YXDB format, simply follow the steps just like you did for CSV and Excel, and choose the location where you want the data to be stored. Remember, Designer is going to do as much as it can to help you by automatically setting configurations when you choose the file type (see Figure 4-7).
Spend some time getting comfortable using YXDB file types. It will definitely help you on your Alteryx journey. Now, as important as it is to be able to output to files such as YXDB, I would argue that outputting to databases is even more important. Let’s make sure you get a solid understanding of how Alteryx Designer allows you to output to databases.
Output to Databases
As stated in the last chapter, Alteryx Designer supports nearly all of the most widely used data sources in the world—Microsoft SQL Server, Amazon S3, MongoDB, MySQL, and even newer sources like Snowflake and Databricks. It’s fun to think of all the combinations of data sources one could read from and write to! This is one of the reasons why Alteryx is so flexible. If you need to move data from Excel to Amazon S3, you can. If you need to move data from Microsoft SQL Server to Snowflake, you can. To make this happen, you will need to understand how to use the Output Data tool for your respective data source. I will not cover every data source in this book, but I will give a couple of examples that should be helpful, as they operate in a similar manner.
In the last chapter, you learned how to connect to databases as well as how to use the Input Data tool. Here, you are going to connect to ODBC or OLEDB in the same way with the Output Data tool as you did with the Input Data tool. If you need a reminder of how to set up a connection or a credential, and connect them, then I recommend reviewing that part in the last chapter. There are a couple of nuances with the Output Data tool, however, that I will cover in the following examples.
Output to Microsoft SQL Server
First, you will bring in an Output Data tool from the In/Out tool palette and configure it. To configure the tool, select “Set Up a Connection” (see Figure 4-8).
Once the Data connections window is open, select the database with which you want to connect. In this example, choose ODBC for Microsoft SQL Server (see Figure 4-9).
Upon selecting the connection you want, you will see a window that will be different depending on the type of connection you make. Because you are connecting to ODBC, you will see the Connection Manager (see Figure 4-10).
As I’ve said, when you’re new to working with Designer, it’s important to play around a little. Give yourself time to explore and test different options so that: 1) you become more familiar with the different options, but more importantly 2) you learn which options give you what you want (ease of use, performance, or even a functioning connection).
In the SQL Server ODBC Connection window, choose your DSN and assign the appropriate credentials. There is one configuration step that is unique to the Output Data tool and that’s the Output Table selection. You will need to tell Designer what table you want to write out. Once you have entered a table name, click OK (see Figure 4-11).
You will now see the Configuration window populated with the base settings. Let’s go through the options so you know what you can change or update.
Configuration options
While we discussed Max Records Per File and File Format earlier in this chapter, Connection ID and Table were set as part of the connection, and for the Output Data tool there are quite a few important options that are different. These options will differ slightly depending on whether you connect using ODBC or OLEDB:
- Output Options
You have many options for how you want to write the data. You can do everything from Append, to Delete Data & Append, to completely dropping the table or deleting data before writing. You also have the option to conduct Updates. All of these options allow you to be very precise in your workflows, to update as much or as little as you need (see Figure 4-12).
Warning
To use the Update statements within the Output Options, you will need to ensure you have Primary Keys on your table.
- Append Field Map
- This option allows you to map different fields of your output to the table/sheets fields that exist in the output source.
- Key for Update
- If you are conducting a database update action, then you could use this in order to provide a key to the database.
- Pre Create SQL Statement
This option allows you to run a SQL statement prior to your main update/output. This is very useful if you need to create a temporary table, delete specific data, update a table, or even run a stored procedure prior to writing the data to the table specified in the Output Data tool configuration.
Note
You will likely need to use the OLEDB format in order to use stored procedures.
- Post Create SQL Statement
- This is the same option as the Pre Create SQL Statement that allows you to run a SQL statement, but in this case it is after writing the data to the table specified in the Output Data tool configuration.
- Ignore Pre/Post SQL DROP TABLE Errors
- If errors are passed back from the database, then they are ignored. This keeps the workflow from erroring out.
- Table/Field Name SQL Style
- Depending on the type of database you are connecting to, you might need to modify this if the SQL style of your table and/or field names need to include quotes or not.
- Transaction Size
- This is the number of rows that are written to the database at a time. If you are working with large database queries, you’ll want to determine an optimal size for your database, which depends on factors such as available memory.
- Show Transaction Messages
- This option allows you to display a message for each transaction in the results log of the workflow. Each message contains the sum of records written up to that transaction.
- Spatial Object Field
- This option allows you to set the spatial object to include in the output. Spatial files can only contain one spatial object per record.
- Spatial Object Field Size
- This option allows you to set the size of the spatial object.
- Projection
- The method used to portray a part of the spherical Earth on a flat surface is called a map projection. If you are working with spatial data, this option allows you to select the projection type to output. By default, Projection is blank and outputs to WGS 84. For in-depth information on Projection Support, visit the Alteryx help documentation.
Once you set the configuration options to your preferences, then simply run the workflow. If you got it right, you will see data in your database. Work through this cautiously and connect with someone if you need a little extra help. Once you get it, you’ll be empowered to make some amazing workflows!
Output to Multiple Sources
A helpful feature within Designer is its ability to write out to many different sources at one time. You can write out to both files and database tables, and split the data based on a field’s values. For example, let’s say you have a table that shows Sales Territory information (see Figure 4-13).
If you need to split each Sales Territory Group into its own sheet or table, you can output this data to four different tables—one for each group (North America, Europe, Pacific, and NA). To be clear, the number of tables, files, or sheets created depends on the number of distinct values in the column you choose.
To configure this option, check the box at the bottom of the Output Data tool Configuration window that says, “Take File/Table Name From Field” (see Figure 4-14).
You will then decide which option you want to use for the output option (see Figure 4-15).
Let’s walk through some examples for each of these output options.
Append Suffix to File/Table Name
This option allows you to output to the tables or files for each value of the field that you’ve selected. Let’s say you chose “Test” as the data source name. What you would see, if we use our SalesTerritoryGroup example, are the tables shown in Figure 4-16. Notice how, for each region, the region (aka the SalesTerritoryGroup) is appended to the Test table.
Prepend Prefix to File/Table Name
This option behaves the same as the Suffix option, except the names of the fields will be prepended to the beginning of the file/table names. For example, if we wrote out to Excel sheets, we would see something like Figure 4-17, where the region is prepended to the beginning of each table name (sheet name).
Change File/Table Name
With the previous two options, the output is adding to the names either by appending or prepending. This option changes the entire file/table name. Instead of seeing the “Test” name, if we use the Change File/Table Name option, we can make the output a bit cleaner (see Figure 4-18).
Change Entire File Path
There are numerous creative ways to use this option. One that is very useful is when you are working with CSV or TXT files and you want to write out to multiple files that happen to be in different directories. If you choose this option, to write out to the CSV format, you can also get very specific and dynamically build the file paths. Let’s say that for each Territory Group we have a corresponding folder, and in each folder we want to output the corresponding files by Territory Region. The formula could look something like Figure 4-19. (More on building formulas in Chapter 5)
If we look at what one of the Sales Territory Groups would look like, we would see something like Figure 4-20. Each Sales Territory Region became its own CSV file within its North America Sales Territory Group.
You have complete freedom to determine dynamically where files are placed and how they are named. As another example, you could push all the Europe Sales Territory Group files to a “European” folder on a completely different server or shared drive, if you wish. The important idea here is that Designer is flexible enough to allow you to build that logic into your workflow, so you do not have to do it manually.
Conclusion
Outputting your data from Alteryx is a powerful feature that allows you to get data into the format you need but also do it dynamically and with precision. It’s important to know there are many more tools that allow you to output data in a specific format. These tools include:
Write Data In DB
Render Tool
Amazon S3 Upload
Google Sheets Output
MongoDB Output
Publish to PowerBI
Publish to Tableau Server
Salesforce Output Meta
SharePoint List Output
API Output
Blob Output
Python Tool
R Tool
Run Command
You are now able to update the data you want as well as output data in batches across multiple files or tables. In this chapter, you learned how to write out to both files and databases. Again, I challenge you to break any assumptions you may have around always writing out to files. Maybe a database can be more advantageous for you. You have some specific options, like pre and post SQL statements, that can help you manage your output to databases. These options are important to understand, but I recommend spending time learning the options I covered in this chapter to help you make the most of the output tools.
So far, we have covered getting your data into and out of Alteryx Designer. I now want to show you how to get started using Alteryx Designer to clean your data. This is an important and critical step in your journey to becoming proficient, as well as understanding more complex topics in Alteryx Designer!
Get Alteryx Designer: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.