Chapter 4. Managing Code and Testing
Three stages within the DevSecOps lifecycle focus on traditionally developer-related tasks. These include the development or coding itself, building the resulting code into an executable application, and testing the application. Full testing would involve other teams such as quality assurance (QA), but at least some tests are done at the developer level. The build step is significantly customized depending on the programming language in use. Some languages require compiling of artifacts, while others can be deployed in an uncompiled state. The chapter begins with a focus on development and wraps up with concepts and tools for testing software. Along the way, the chapter also introduces Git for source code management and two patterns for using Git when developing software.
Examining Development
With the number of programming languages available, it is impossible to provide a single section, a single chapter, or maybe even a single book that distills all of the knowledge needed to be a successful developer in that language. There are also numerous books covering high-level programming design and architectural concerns as well. Though it will seem self-serving, a general rule that I’ve followed in my career is to look for books published by O’Reilly, because the books have thorough coverage. In the area of software design and architecture, Martin Fowler has written several books that are canonical in their respective areas in the same way that the TCP/IP Illustrated series by W. Richard Stevens was the go-to source for many years. With respect given to those and other related works, there are a few ideas that I try to relate to my students working on production-style programming. Also noteworthy is that these ideas are themselves distillations of the ideas of the aforementioned and others, but I have found them eminently helpful and approachable for students.
Be Intentional and Deliberate
Even before artificial intelligence enabled people to receive viable-looking answers to coding problems, developers were borrowing code from others. Whether the code worked exactly correctly or fit the design was sometimes a distant second place to simply completing the task. This is where being intentional and deliberate are relevant. A developer could technically complete the task with nested loops and hardcoded values, but doing so would introduce technical debt and may not work correctly beyond the narrow focus of the current task and with limited testing. Consider this code that assumes there will always be 50 states in the United States and that their alphabetical order will always remain the same:
for (i = 0; i < 50; i++): if (i == 49): state_name = "Wisconsin"
While the example may be somewhat extreme, this type of hardcoding exists when there are time pressures or other factors that cause a developer to consider the code to be complete when it may not be fully developed.
Note
“Technical debt” is a term used to describe borrowing time from future development or advancement of an application or system. Hardcoding values in a program rather than abstracting the values to a variable or constant may save time for this one single task with a single value of test data, but the next time that value is needed, it will have to be hardcoded again. If the value ever changes in the future, then all of those locations where the value was hardcoded will need to be changed, potentially introducing errors. While the time was saved for the single instance in the one file, that time will be repaid later, just as a monetary debt would be repaid at a later date.
Don’t Repeat Yourself
Consider this code that is used to calculate the total for an order by multiplying the subtotal by the tax rate (5.5%):
order_tax = subtotal * 0.055
If the tax rate never changes and does not need to be used anywhere else in the application, this code meets the criteria for a minimally viable product (MVP). However, another developer is also working on a portion of the application that needs the tax rate. Instead of using the decimal representation of the tax percentage, they choose to use the percentage itself. Their code looks like this:
order_tax = subtotal * 5.5
These two pieces of code will produce wildly different values. The developer may not notice the problem because the math is technically correct, as in the value produced by the multiplication operator produces a correct result.
Instead of relying on hardcoded values, a constant could be used for the tax rate:
const TAX_RATE = 0.055 order_tax = subtotal * TAX_RATE
There is less confusion with the use of a constant. In addition to less confusion, there is now only one place to change the tax rate when the rate increases in the future.
Managing Source Code with Git
Whether developing as part of a team or as a soloist, tracking changes to source code enables you to look back at the history of changes to the code. You can then revert back to an old version of the code should something break with a newly introduced code. Source code management (SCM) tools such as CVS, SVN, Mercurial, Git, and others provide the ability to track changes.
In an organizational setting, there’s a good chance that code from different parts of a project is shared and worked on by multiple developers simultaneously. Each developer makes changes, which are tracked by the SCM. When the code is uploaded to a common SCM server, the changes from each developer are merged with one another, producing a single coherent set of software files containing all of the changes from those developers. Linus Torvalds, creator of the Linux kernel, created the Git SCM tool. Git is a popular open source SCM that is widely used. This section looks at two methods for managing source code with Git: the Gitflow pattern and the trunk-based pattern. But first, we’ll establish a baseline or minimal pattern.
A Simple Setup for Git
This section outlines a method for using Git on a private and independent server, such as a server housed on premises in an organization. The obvious advantages include privacy and cost. There would be no need for hosting the source code repository at a third party, and there is no cost for Git regardless of the number of developers who use it within an organization. The disadvantage is a slightly more difficult integration, depending on the number of users who need access.
This section assumes that you have a Linux server running and have installed Git and an SSH server. If you have not, then a Linux instance can be deployed on AWS or another cloud provider. Both Git and an SSH server are available through the package management tools of most major Linux distributions.
Note
Referring to a Git server is somewhat of a misnomer. A Git server does not run any special software other than the same Git-related commands that run on a client. The “server” is merely an agreement that you will use a central location from which source code will be uploaded and downloaded. For many, that central location is GitHub, but for others, it’s an internal server.
One of the protocols that you can use for communication between client and “server” with Git is SSH. Because SSH is a key technology behind many other DevSecOps processes, using SSH for Git also makes sense because the software has typically been installed for other reasons.
The Git usage patterns in this section both rely on role-based access control through groups. In other words, users are added to the Linux server, and those users are then added to groups. For example, a group called gitusers is created. Members of that group have access to the Git repositories. The following example demonstrates sharing of a repository by two users. The assumption is that the users already exist. Afterward, the users will both be able to commit to and fetch changes from the other user. The two usernames are suehring and rkonkol for the example, and they will both be added to the gitusers group. The repository in the example is named devsecops and is stored in the /opt/git/ directory on the server. More complex scenarios are available for sharing, whether with Git or with other software such as GitHub.
Add a user called gituser:
adduser gituser
Change the shell of the gituser account. When prompted, change the shell to /usr/bin/git-shell:
chsh gituser
Create a .ssh directory within the home directory of gituser:
cd /home/gituser && mkdir .ssh
Change ownership of the .ssh directory as well as its permissions:
chown gituser.gituser .ssh && chmod 700 .ssh
Add an authorized_keys file within the .ssh directory and change its permissions. Technically this step isn’t required right now but will save a step later:
touch .ssh/authorized_keys chmod 600 .ssh/authorized_keys
Add a group called gitusers:
groupadd gitusers
Add the two accounts for your developers:
adduser suehring adduser rkonkol
Add each user to the gitusers group:
usermod -G gitusers suehring usermod -G gitusers rkonkol
Have each of the developers generate SSH keys using the ssh-keygen
command. You can also do this for the developers by becoming them, or assuming their identity, by using the su
command, such as:
su - suehring
For completeness, if you are logged in as (or have assumed) the suehring user:
mkdir .ssh chmod 700 .ssh cd .ssh ssh-keygen
accept the defaults for filename and determine whether you would like to add a passphrase to the key.
When an SSH key is generated, a pair of files will be created. By default the files are called id_rsa and id_rsa.pub. The id_rsa file is a private key, and the id_rsa.pub file is a public key. The private key should be kept private and not shared with anyone, while the public key can be shared.
To that end, copy the contents of the public key for each user to a file called authorized_keys within the gituser home directory. This step enables both of the developers to SSH as gituser. Be sure to use two greater-than signs for this command, otherwise the contents of authorized_keys will be overwritten.
Assuming that your current working directory contains the file id_rsa.pub, which it will if you followed the previous set of commands, run the following to add the key to the authorized_keys file for gituser. This command should be run for each of the developers using the contents of their individual public-key file:
cat id_rsa.pub >> ~gituser/.ssh/authorized_keys
The steps completed thus far are one-time foundational steps that need to be completed to prepare the server. In the future, only the developer accounts will be created and an SSH key generated and added to the authorized_keys file. It gets easier after the initial setup!
With these steps complete, it’s time to create a Git repository. On the Git server, run this command to create the directory that will hold the repository:
mkdir /opt/git/devsecops.git && cd /opt/git/devsecops.git
As noted before, this server uses /opt/git as the base for Git repositories. You might store the repositories elsewhere based on your organizational standard.
Create the repository:
git init --bare --shared=group
Change ownership and permissions:
chown -R gituser.gitusers /opt/git/devsecops.git chmod 770 /opt/git/devsecops.git
That’s it. The next time you need to add a repository, you can simply run the commands to initialize the repository and change its ownership and permissions, because the gituser account and the developer accounts were already created.
At this point, the developer should be able to clone the Git repository to their local development environment. This command assumes a server name of source.example.com. Change that according to your server naming convention:
git clone gituser@source.example.com:/opt/git/devsecops.git
If this is the first time that the developer has SSHed into the server, they will be prompted to accept the host key from the server. Assuming that the host key is valid and correct, typing “yes” will result in a clone of the Git repository being downloaded into a directory called devsecops in the current directory.
Now that the setup has been done, it’s time to look at using Git.
Using Git (Briefly)
There are a handful of commands that you will use frequently with Git. The basic idea is:
-
Clone repository.
-
Write code.
-
Commit code.
-
Push code.
If you are working with other developers, then you’d add an additional step:
-
Merge code.
It is that final step, merge code, where all of life’s problems occur and which is a major contributing factor for why DevSecOps is needed. More on merging later.
You’ve already seen two Git commands, git init
and git clone
. Initializing a repository is done once per repository, so the git init
command will be used infrequently. Cloning a repository will occur more often, every time you need to download a new copy of the repository. However, once the repository is cloned, you will use other commands to interact with it to send your code to the server and to retrieve code from others in that same repository.
There is no Git-specific command for the step labeled “write code” that I mentioned earlier. However, after the code is written, those changes should be committed to the repository. There are two primary paths through which code is tracked within a repository:
-
Adding a new file with new code
-
Adding code to a file that already exists in the repository
When a new file is added within a Git repository, that file needs to be tracked so that changes are noted and a history of those changes is maintained. The git add
command is used to add a new file to be tracked by a Git repository. This command adds a file called README.md to an existing repository:
git add README.md
From that point forward, changes to the file README.md will be tracked by Git within this repository.
When a file already exists within the repository, which is equivalent to saying, “Git knows about the file,” then changes are tracked but need to be committed to the repository. Another way to think of a commit is as a checkpoint or a point-in-time snapshot of the contents of the file or files at that moment. An important conceptual difference between Git and other SCM tools is that the git add
command will also be executed every time you want to commit changes to the file. This concept can be confusing and bears additional explanation.
Recall that a repository called devsecops was created in the previous section. That repository contains nothing; it is empty except for a .git directory that contains metadata managed by Git itself. When a file is added to the devsecops directory, the file is in an untracked state, meaning that Git is aware that the file exists within the devsecops directory but that the file will be ignored.
Untracked is one of two states in which files exist within a Git repository. Another state for a file within a repository is the tracked state. When files become known to Git within a repository, they are referred to as tracked. But those two states tell only part of the story. When a file becomes tracked, Git begins monitoring that file for changes. It’s here that conceptual problems begin around the terms “state” versus “status.” For practical purposes, untracked files are irrelevant to the repository, and therefore that’s where we will leave them and focus instead on tracked files.
When a file is tracked, it will be forever tracked by Git. The file exists in one of these statuses:
-
Unmodified
-
Modified
-
Staged
An unmodified file is one that has not changed since the last commit. With existing repositories that were just cloned, all files are unmodified because they were just downloaded from the remote repository. However, when a tracked file is changed, Git refers to the file as having a modified status.
Simply being modified does not mean that the file will be committed or able to be seen by other developers. To be committed and eventually seen by other developers, the file needs to be staged. A staged file is one that will be included in the next commit.
The difference between modified and staged is as fundamental as the difference between tracked and untracked. Having modified versus staged files enables you to choose which specific files to commit. You might also make multiple commits so that each commit creates its own snapshot with its own files. You might never need to separate commits in such a way, but the flexibility of having modified versus staged is available, should you ever need it.
Committing changes to the repository is accomplished with the git commit
command. Assume that a file called index.php already exists in the repository.
If you make changes to the file, you still need to add the file to the staging for this commit using git add
. After git add
has been executed, the next step is to commit the staged changes:
git commit
The code itself is saved as a checkpoint and added to the history metadata tracked by Git. When you execute git commit
, you are prompted to add a commit message. The commit message is a short message about the commit itself. For example, if you added a new title to index.php, then you might add the message “Added new title.” This message is then viewable within the commit history of the repository (more on this later).
If you don’t want to be prompted for a commit message and also don’t want to execute git add
for every change to a tracked file, you can add a couple of command-line options that alleviate the need for both. The following command is the equivalent of executing git add
and git commit
and then adding the previous message through a text editor:
git commit -a -m "Added new title"
The -a
option adds files to the commit that have previously been added or made known to the repository. The -m
option adds a message.
Even though the changes have been committed to the repository, those changes are only stored on your local machine. This means that the changes are not viewable by other developers and, importantly, are not being backed up in any way other than backups that you might have set up for your local development machine. The final step is to send the code back to the server. This is accomplished with the git push
command, simply:
git push
Code is uploaded to the server from which the repository was first cloned.
Tip
If you aren’t sure where the code will be going, use the following command:
git remote show origin
Doing so will display the destination for the git push
command.
You can view the commit history of the repository with the git log
command. When you execute git log
, the commits that are known within the repository are shown. It’s worth noting that the commit history does not communicate with the server, so the history shown is limited to that which has been downloaded or cloned from the repository.
Branching and Merging
In an ideal world, a single developer is responsible for all of the code for an application. That developer never misunderstands a requirement, requirements are never missed, their code is perfect, and there are never bugs or other errors beyond their control.
In the real world, developers work in teams, requirements are missed and misinterpreted, bugs are introduced, and bugs are found that are outside of the control of the developers. When developers work in teams, the shared code can sometimes be a source of errors. If the developers implement a data structure differently or simply misspell a variable, errors occur when the code is merged. The later the merge occurs, the more impact the error will have on prior steps.
Restated, there is a greater chance of a bug impacting the release date when code is brought together and tested later. For example, fixing the bug means that previously tested code needs to be retested. If another part of the code relied on the bug or worked around the bug, then that other code may need to be rewritten.
Branching in SCM provides a means to work on code in parallel without affecting others. When a repository is cloned, the master or main branch is cloned. From this main branch, a developer may create a new branch to work on the code for a new feature. Simultaneously, other developers may be doing the same thing, each creating their own branch of the code, all separate from one another.
Creating a branch within a Git repository is accomplished with the git branch
command. For example, if you wanted to create a branch called project5
to work on changes related to the new website title, you use the following command:
git branch project5
The branch is then created, using the current code as its base. While the branch has been created, any changes that you make will remain within the current branch until you switch to the newly created branch. This is accomplished with the git checkout
command, as in:
git checkout project5
Just like the options added onto the git commit
command earlier, so too can you add a command-line option to the git checkout
command that will create the branch and switch to it all at once:
git checkout -b project5
Changes to code and the commits related to those changes will now be sent to the project5
branch. The git merge
command is used when code needs to be brought back into the main branch. Merging examines each object in the repository to determine if there are changes to be included between the two branches of code. Git does its best to determine which object is the latest version and to resolve conflicts between two files that have changed between the branches. For more information, see the Basic Branching and Merging section within the official Git documentation, where you can find further details on merging and what can be done if a merge conflict occurs.
While branching keeps code from multiple developers logically separate, it does not solve the issue of late merges introducing bugs and untested behaviors. Multiple methods exist for managing team-based development. One such method is the Gitflow pattern, which we’ll look at next.
Examining the Gitflow Pattern
Gitflow describes a process for sharing code through Git that uses separate development paths. Gitflow uses branching. Figure 4-1 shows the Gitflow pattern.
As Figure 4-1 shows, there are several swimlanes within which active development takes place, while other swimlanes are reserved for the main line of production or released code. A walk-through of code through Gitflow helps to illustrate how changes are applied and then brought back into a release before being sent to production. Consider some code for a website. On Day 1 when the code is released, a bug is found. The bug needs to be fixed immediately. Therefore, a developer creates a branch to apply a hotfix. In Figure 4-1, the hotfix appears within the Hotfix swimlane. However, the developer finds that the bug is somewhat larger than anticipated and thus continues to determine how to fix it.
Meanwhile, development begins on enhancements to the site. This is reflected by a Develop swimlane and corresponding branch. The development branch is further split into feature branches so that multiple developers can work together on this development iteration or sprint. As features are completed, the code is merged back into the Develop branch. Eventually, the features and hotfixes (just one Hotfix in Figure 4-1) are merged with a Release branch.
The Release branch contains code that is ready to be deployed. Various elements are brought together at this final stage, though the exact components and processes vary from organization to organization and from release to release. For example, some organizations will have a formal build process, where code is compiled and final tests are executed. Other organizations may have a longer beta testing phase with a release candidate.
During the merge process between each swimlane in the Gitflow architecture, there can be one or more layers of approval prior to the merge being allowed. These gatekeeping processes serve as a checkpoint to ensure that unwanted code and side effects for that code are not introduced into the release or main branches or out to the production environment. A larger problem is that development branches tend to remain active for a long time. Features and hotfixes are merged into the development branch, but sometimes a hotfix isn’t applied or is overwritten by later code that then reintroduces the issue for which the hotfix was originally applied.
With DevOps and then DevSecOps, an emphasis was placed on continuous integration/continuous deployment (CI/CD) and the automated testing that is necessary to deploy code to production with minimal checks. The premise is to move testing earlier and more often, essentially shifting the testing phase more to the left in a traditional waterfall SDLC model.
Note
When you read or hear “shift left,” the shift is referring to moving testing and other elements of software development earlier so that problems are captured and addressed before being compounded.
With an emphasis on automation and de-emphasis of formal and manual approval processes, a new method for branching was developed. This new method is simplified from the Gitflow pattern, and that is the focus of the next section.
Examining the Trunk-Based Pattern
The premise behind the trunk-based pattern is to do away with long-lived development branches in favor of committing and frequently pushing code to a main branch, often called the trunk, so that the code can be tested and deployed. Figure 4-2 shows an example of the trunk-based pattern.
Comparing the trunk-based pattern to the Gitflow pattern, you’ll notice fewer swimlanes: just Trunk, Features, and Hotfixes. The idea is to promote from the Trunk, which is necessarily short-lived in order to avoid large merges. It’s worth noting that both the Releases and Main branches can sometimes exist, but that’s more so a logical or process-related necessity instead of a requirement for trunk-based development.
“Promote early, promote often” is the main idea for trunk-based development. That sounds great in theory, but in practice the implication is that a mature and thorough testing infrastructure exists and is capable of testing the application. This is the concept of code coverage. Code coverage describes the ratio of code to tests of that code. For example, consider this simple conditional statement:
if ($productType == "grocery") { $taxable = false; }
A positive test for this code is to set the contents of the $productType
variable to grocery and then examine whether $taxable
is true, false, or unset afterward. Another test is to set $productType
to anything other than the string of characters grocery and then examine the contents of the variable $taxable
afterward. It would be tempting to indicate that the code coverage is 100% for this code. However, what if $productType
is not set? The answer depends largely on the code that would be above the example shown here. Some languages will also not allow $productType
to be unset and would provide an error during the compile process. Therefore, code coverage will depend on language and context.
The turn toward code coverage leads the chapter toward the concept of testing, which coincidentally is the next section. Choosing a branching strategy is not a permanent decision. I recommend trying (or continuing with) more formalized and deliberate patterns for code management to better understand where gaps exist in development, testing, and deployment. As you improve development practices, testing coverage, and deployment, simplify the code management processes to keep them from getting in the way of progress.
Testing Code
Examining an application for defects is accomplished at various stages throughout the SDLC. At the most basic level, a developer tests their own code. Consider the conditional statement from the previous section. A developer would likely test their code for the cases described in that section, testing both the positive case with the product type set to grocery and at least one negative case with the product type set to something other than grocery.
This section examines several aspects of testing, from basic developer tests to QA testing by end users. Included in the discussion of testing are both functional and nonfunctional requirements. Recall from Chapter 1 that a nonfunctional requirement is something like security or transaction speed. While it’s possible that these are highlighted as specific requirements, it’s more likely that requirements gathering will not include questions like “How fast would you like the application to load?” Instead, nonfunctional requirements may rely on service-level agreements.
Unit Testing
The conditional code shared earlier in this chapter would be part of a larger block of code that is tested by the developer as they write the code. When tested in small units, at the function level or similarly small pieces of code, this is called unit testing. There is no strict rule as to how small or how large a block of code can be and still be considered a unit test. However, as the number of dependencies increases, it becomes less likely that the test would be considered a unit test. Put another way, if the code being tested depends on several other files and preconditions, then the test is more akin to an integration test, where multiple elements are combined.
A basic goal of unit testing is 100% coverage of all conditions, such as the condition shared earlier. In addition, static analysis should be performed on the code. Static analysis describes a means to examine the code without executing it. Static analysis is frequently used for verifying adherence to coding standards and to validate basic application security.
Unit testing exists regardless of DevSecOps processes. However, moving toward DevSecOps means automating as many unit tests as possible. Ideally, all unit tests should be executed in an automated manner. Doing so facilitates the CI/CD processes needed to fully leverage DevSecOps.
Integration Testing
Integration testing brings together units of code to verify that those units work together to achieve the functional requirements of the application. Reaching the level of integration testing implies that unit tests are complete and successful. Where connections between units of code exist, integration testing verifies that those connections are working as intended.
System Testing
The third level of testing is commonly referred to as system testing. The goal with system testing is to combine all components in an environment that is as close to a production environment as possible. Both functional and nonfunctional tests should be performed in system testing, and ideally the data used will be a de-identified version of production data or a subset thereof. The caveat around whether a subset of data is acceptable is that using only a fraction of the data may hide performance-related problems. For example, if the normal production dataset is a multiterabyte legacy database and one of the features of the application requires querying that data, then using only a few gigabytes may mask a problem with the query. In the testing environment, the query may return results with acceptable performance, but when the entire dataset is queried, the results take minutes to return.
Automating Tests
Automation is a key factor in determining the success of any DevSecOps efforts. There are numerous tools available that help automate testing of code. One such tool is Selenium. Selenium provides a full-featured test suite that can be scaled to distribute tests from multiple locations, and an IDE to help with creation of tests. There are also Python bindings for the underlying Selenium web driver.
Retrieving a page using Selenium and Firefox
You can execute Selenium tests from the command line using Python. Doing so enables you to build a simple means to test during development but also to create a sophisticated bot that can crawl the site as you create it, taking screenshots along the way to prove that a page exists and was rendered without error. This section shows Selenium with a headless Firefox browser running on Debian Linux. Later in the book, I’ll show you a more complex example using Docker. The simple example shown in Example 4-1 seems to be missing from many of the tutorials that exist online. While the example lacks some of the debugging options and other niceties that you might want, such as a try/catch
block, those can be added later.
Example 4-1. Basic Python code to retrieve a web page and capture the results
#!/usr/bin/env python from selenium import webdriver proto_scheme = "https://" url = "www.braingia.org" opts = webdriver.FirefoxOptions() opts.add_argument('--headless') driver = webdriver.Firefox(options=opts) driver.implicitly_wait(10) driver.get(proto_scheme + url) driver.get_screenshot_as_file('screenshot.png') result_file = 'page-source_' + url with open(result_file,'w') as f: f.write(driver.page_source) f.close() driver.close() driver.quit()
Within Example 4-1, the first line interrogates the environment for a Python executable and enables execution of the file as a normal command rather than needing to preface the filename with “python” on the command line. For example, you’ll see numerous examples online where programs written in Python are executed like this:
python3 program.py
Instead, by including the interpreter on the first line as shown, the file can be executed like this:
./program.py
Note
The assumption is that the file is executable; if not, then you can chmod u+x program.py
to add the executable bit.
Python 3 should be the default. If not, or if you receive errors regarding the version of Python in use on your system, you can remove this line completely and execute the file as shown earlier, by prefacing with the Python 3 interpreter.
The webdriver from Selenium is imported next, followed by two variables to establish both the protocol scheme and hostname to be tested. The next three lines set an option to execute Firefox in a headless manner. The headless option is used so that an X Window system or desktop environment does not need to be installed for this program to work. Firefox simply executes behind the scenes without need for the graphical interface.
The following line sets a wait time of 10 seconds. This can be adjusted as necessary for your environment. Ten seconds was chosen arbitrarily for this example. The web page is retrieved with the next line, and a screenshot is captured and named screenshot.png. The last section of the program opens a local file for writing and places the page source into that file. Finally, the session is closed and the browser quits executing.
It is worth noting that the program executes a copy of Firefox in the background. If the final call to quit()
is not executed because of an earlier error, then there will be orphaned Firefox processes running on the system. A reboot of the computer would solve it, but because this is Linux, a reboot shouldn’t be necessary. You can find the leftover Firefox processes with this command:
ps auwx | grep -i firefox
The resulting output will look something like this, although the username and process IDs will be different:
suehring 1982868 51.9 16.3 2868572 330216 pts/1 Sl 12:12 25:43 firefox-esr --marionette --headless --remote
Issuing the kill
command on the process ID will stop the process from running. In the example, the process ID is 1982868
. Therefore, the following command should be issued in order to stop this process:
kill 1982868
As noted earlier, an obvious improvement would be to include much of the processing within a try/catch
block, which will alleviate some of the chance of orphaned processes being left over after an error. Another improvement would be to capture the initial URL as a command-line option along with the ability to crawl the site, collecting links found on the site and visiting those. Some may not consider those to be necessary or even an improvement. Therefore, simple is better, and this example shows the basics of retrieving a page.
Retrieving text with Selenium and Python
The previous example shows the use of Python, Selenium, and Firefox to retrieve the source from a web page and take a screenshot of that page. As part of testing, you may want to be alerted to a page not rendering correctly or with the correct elements or text within those elements. For example, if you’ve written tests to log in to a page and then expect a greeting on the next page that should contain your name, you can write a test to retrieve the specified HTML element and verify that the correct name is displayed within that element.
Text can be retrieved using a few methods. Example 4-2 shows a means to retrieve the copyright notice from a page if that notice is contained within a <p>
element, as it is on my site currently.
Example 4-2. Retrieving a web page and displaying the copyright notice
#!/usr/bin/env python from selenium import webdriver proto_scheme = "https://" url = "www.braingia.org" opts = webdriver.FirefoxOptions() opts.add_argument('--headless') driver = webdriver.Firefox(options=opts) driver.implicitly_wait(10) driver.get(proto_scheme + url) driver.get_screenshot_as_file('screenshot.png') copyright = driver.find_element("xpath", "//p[contains(text(),'Copyright')]") print(copyright.text) result_file = 'page-source_' + url with open(result_file,'w') as f: f.write(driver.page_source) f.close() driver.close() driver.quit()
The two substantive changes are shown in bold within the listing for Example 4-2. You could also simply print the text on one line, like so:
print(driver.find_element("xpath","//p[contains(text(),'Copyright')]").text)
However, I lean toward the version shown in Example 4-2 because that version enables manipulation of the element for uses other than showing the text.
Summary
This chapter provided information on development and testing. Being intentional and deliberate is understated among development paradigms and patterns. However, the phrase “intentional and deliberate” captures the essence behind knowing why you’re using a pattern or even a line of code in a certain location. We also examined the Git SCM tool, along with the Gitflow and trunk-based architectures for managing code from creation through deployment. Finally, I discussed three levels of testing and provided test automation examples using Selenium and Python. My goal with the examples was to provide a simple baseline or foundation to which you can add additional complexity.
Chapter 5 continues to focus on DevSecOps practices with management of configuration as code. Developing with containerization techniques is frequently part of DevSecOps and modern development. Chapter 5 also demonstrates Docker.
Get Learning DevSecOps now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.