For those DBAs using SQL for data discovery, the move to data science can involve a brand-new set of varied tools and technologies. This article is a walk-through of setting up the tooling to do some data discovery using Python. By setting up your workflow using GitHub, VSCode and Python, you will have the basic architecture set up to start your journey down the path to kicking the tires on data science on the Microsoft stack.
Table of contents
This demo uses GitHub and Windows; however other platforms can be used. I have just selected these as more common for the teams I work with.
This walk-through will show you how to set up;
- Visual Studio Code
- Integrate GitHub with VSCode
- Integrate Python with VSCode
Warning: This is try a your own risk walk-through!! Every machine is different and should not be used in a production environment. This is provided as an illustration and covers a base machine install. No guarantee or warrantee is provided or implied.
Visual Studio Code (VSCode)
Using VSCode as your source code editor allows you to use the same editor across numerous platforms, Windows, Linux and macOS, while directly integrating with GIT (GitHub in our examples). For Python, this brings the ability to have a full code editor, which brings in functionality such as syntax highlighting, code completion, code refactoring, and snippets. Features such as changing the themes, keyboard shortcuts, and many configuration preferences allow you to create your own editing experience.
Note: This article assumes that this is not the first time installing an application. Some screens shots will be displayed, but any steps with simple choices are provided as a detail.
- Installing VSCode is as simple as going to the VSCode portal and clicking download on the main portal page.
- This option will take you to the Getting Started page. If you get the following popup, which could be different depending on which browser you use, you can save a copy or run the installer directly after download.
- Once the installation begins, the walk-through wizard, detailed steps below, will guide through the process.
- Welcome – Select Next to begin the installation.
- License Agreement – Accept the License Agreement.
- Select Destination Location – Either Accept the suggested location for the install or browse to another location.
- Select Start Menu Folder – Accept by hitting Next or Check the “Don’t Create” box if you do not want a startup folder.
- Select Additional Tasks – I normally select all these options as I use VSCode for most of my coding tasks. One item you may want to decide is the “Register Code as an editor for supported File types.” This makes VSCode the default editor for specific file types and will open when double-clicking on a supported file. There is a discussion on VSCode’s GitHub site that reviews this option in more detail. Select Next to continue.
- The final screen is then displayed. Select Install to begin the installation.
- The Completed notification will allow VSCode to launch and the setup to continue.
Configuration of VSCode
Rather than a full how-to on using VSCode, see the Getting Started page for more. This section will provide a brief interface walk-through in addition to reviewing the configuration of VSCode for our Python tutorial.
Walkthrough of the Interface
|1||Menu Bar||Menu bar that also includes the Command Pallet which is activated by hitting CTRL-SHIFT-P.|
|2||Code Edit Pane||Multi tab edit window. Tag 11 shows the Split Pane which allows you to open multiple code windows.|
|3||File Explorer||This option will expand the left pane to display the files and directories in the current working folder.|
|4||Search||Includes Search and Replace.|
|5||Source Control||Shows the changes that have yet to be checked-in to Source Control.|
|6||Debug Window||Bring up the debug screen.|
|7||Extension window||This shows the Extension windows. Extensions are add-ins that provide additional functionality. Python is such an extension.|
|8||VSCode Settings||Various editor defaults and settings.|
|9||Terminal and Output message toggle||This brings up the Terminal split screen. Problems, Output and debug console are in this section.|
|10||Toast Style messages||Various messages will be displayed depending on the context you are in.|
|11||Open Tab Window||This allows you to open up multiple tabs / code files in the same display.|
|12||Ellipse menu (Callout)||This brings up a callout menu that allows you to close all the currently open code windows or only those that are saved. (As an aside if the 3 dots were vertical they are called a Kabob, Who knew!!)|
Linking to GitHub
Having the ability to place your Python files in source control is a major advantage to using VSCode. There are a couple of helper applications that we need to install, GIT services and I normally also install GitHub Desktop.
- Let’s install GIT Services first. In a browser, go to the GIT project site. If you visit the site using Windows, the Latest Source Link, highlighted below, should be visible.
- The download should start automatically, and your browser should ask for a next step, select RUN.
- The installation wizard should now start. The following tabs are listed below with options to select.
- User Account Control – Select YES to authorize the install.
- Information – Select Next to accept the License and continue.
- Select Destination Location – Select Next to install in the default location.
- Select Components – Review the selected options, pictured below, and select Next. These could change depending on what development you are doing; however, I normally select the defaults as this is really the service and drivers that other tools use. One option which is on by default is the Large File Support. I keep this on just in case. More information in the Git Documentation is available.
- Select Start Menu Folder – This creates a folder, select Next for the default and continue.
- Choosing the default editor used by Git – Choose Visual Studio Code as the default editor. Then select Next.
- Adjusting your PATH environment – Choose the GIT from the command line and also from 3rd-party software as the option.
- Choosing the HTTPS transport back end – Choose the Use the OpenSSL library option. Select Next to continue.
- Configuring the line-ending conversions – Tells GIT how to treat line endings. Since this install is on Windows, I choose the Checkout Windows-style; commit Unix Style line endings. Select Next to continue.
- Configuring the terminal emulator to use with Git Bash – I change this to Use Windows default console window. This is a Windows install, after all.
- Configuring extra options – I keep the defaults and click Install.
Several steps, but this completes the Git installation. There are many options and uses for GIT and we will only scratch enough of the surface to get our demo running.
Installing GitHub Desktop
Not that GitHub Desktop is required, but I use GitHub for my repository for many projects. Using the Desktop client for cloning and setting up projects can give you a view and control of your GitHub repository that a code editor cannot. Not required, but I do recommend if you use GitHub Desktop. We will walk through the install below.
- From the GitHub Desktop site, select the Download for Windows button.
And then Run from your browser popup
- The installation wizard should start.
- The first step is to sign into your GitHub account. This screen also allows you to create a new free account if you do not have one.
- Sign into your account using your email and password. Select Sign in when ready.
- Two Factor – I use and recommend setting up two factor authentications with your account. See the documentation on GitHub for more information.
- Once you are signed in, Select Continue to move on.
- The final screen asks to give feedback while using the application back to GitHub, make your selection and select Finish to complete the installation.
- You should then see GitHub desktop when everything is competed. If you have repositories, they will be displayed on the left.
- The first step is to sign into your GitHub account. This screen also allows you to create a new free account if you do not have one.
Setup and Connect VSCode to use GitHub
VSCode has numerous extensions, of which, integration with GitHub is one of them. By selecting the Extensions icon, you can search and browse various extensions. We will step through the GitHub installation.
As shown above,
- Select the Extensions icon
- Enter “GitHub” in the search box
- And select the GitHub extension.
- Selecting Install will install the extension
- Clicking Repository will take you to the GitHub Extension repository website where more documentation can be found. Normally the documentation appears in the detail window below in Details.
- After the extension installs you will see the following message,
which informs you that you need a Personal Access Token. We will do this in a step below.
This extension will allow us to setup VSCode to have authentication and permissions to write back and forth to your GitHub account. The most important step is getting the Personal Access Token setup. The ‘GitHub: Set Personal Access Token’ is available from your GitHub settings.
- Log into Github.com, under your account, select Settings.
- After Selecting Settings, at the bottom of the next page is a tab for Developer Settings, select this.
- On the Developer Settings menu, select Generate New Token.
- The screen now displays and allows us to provide the access VSCode needs by using this token. Select the repo check box. Also put in a name for this token as you can edit or delete this token later in addition to having multiple tokens.
- At the bottom of the list, you can then select Generate Token.
- You will now see the token. Copy this token as you cannot view it once you leave the page.
Now that we have a token, lets finish setting up our VSCode extension
- To use this token, we have to execute the ‘GitHub: Set Personal Access Token’ command in the VSCode Command Prompt. Type Ctrl+Shift+P in VSCode to open the command prompt window and type ‘GitHub: Set Personal Access Token’. You will then be prompted to enter the token generated from GitHub.
- Enter the token in the prompt, Hit Enter.
Open a Project
Now that we have VSCode connected, we can connect to a repository and begin. We will go to your GitHub account and create a repository.
- Go to your GitHub repository listing.
- Create a new repository using the New button.
- Set up your details for the repository by filling in the screen as shown in the below example. Select the Create Repository button to continue.
- Once you have the repository, get the clone link, by selecting the drop down shown below.
- This is the code that was supplied for my example.
- Open up GitHub Desktop which will bring up the following screen. Select the Clone a repository from the Internet option.
- This will bring up the following Clone a Repository screen. Select the URL tab and paste in the GIT Repository URL from step 4 . Notice the default directory created in the Local Path text box. Select the Clone button to begin.
- This brings up the GitHub Desktop menu which will show the repository. You also see the Open in Visual Studio Code option, as we set that as default in an earlier step. Select this option.
- VSCode opens the project in the local directory. The files are displayed in the Explorer and the GitHub status is highlight on the bottom.
NOTE: It is important to note that the example we are running through is very simple. I use this in a single developer workflow and use this process to keep code versioning and the ability to save the project off the computer in GitHub. If you work with a team of developers, you may check out and create your own branch. When you are finished your changes, you would merge your changes back into the Master branch. Proper code hygiene and working with Git is beyond the scope of this article, however using this process you can certainly keep your own code in check, if you will. A more detailed set of documentation is available on GIT in the article Distributed Git – Distributed workflows. This link is part of a full set of documentation on GIT.
Installing Python and pylint on Your Machine
Before we install the Python extension for VSCode, we need to install Python on your local machine. This will also install an application called PIP which is a Python package manager that we will use to install various packages on your machine. These packages provide functionality to Python though modules, such as NumPy which provides scientific computing functions.
- Download Python – the download link takes us to the downloadable version of Python from the Python Software Foundation. Select the Download Python link highlighted below.
Select Run from the browser popup if displayed.
- A popup will display once the installation begins. Make sure the Add Python to PATH is selected. Click on the Install Now to continue. The install will ask permission, say OK.
- Once Complete, you will see links to documentation for that specific version and an online tutorial. Select Close when ready.
- Installing pylint. If you’re using a global environment, you need to run the commands at an elevated command prompt.
Type in pip install pylint in the prompt and hit Enter.
When complete you should see the following details.
Setting up Python for VSCode
Now that we have integrated VSCode with GitHub, a repository with a project and installed Python, lets continue to set up the Python extensions.
- Now, let’s add a document and make some changes. Simply Right click on the Explorer window and select New File. Name this file Apple.py.
- We now have the first Python files in the work area(1), the Change file (2) showing the new files that we have not checked in, and a Toast(3) message saying that the Python extension is recommended for this file type. As you add different file types, VSCode will display a message if there is an extension that can be used with that file type. Select the Install button. This will bring up some items we want to review.
- A Reload is required to activate. If you select this, VSCode will reopen with the same project entered.
After installation, you will see the following detail on the Extension. Selecting Repository, will bring you to the documentation.
- We can to select a Linter which is a tool that will analyze source code. This is handy as it will flag various errors, bugs and other coding issues as you type. While in a document from the explorer, Open the Command Palette (Ctrl+Shift+P) and select the Python: Select Linter command by starting to type this into the Command Palette.
- In the drop-down list, select pylint which we installed in a previous step.
Running our first script
We should now have everything installed and setup for our first test.
- Enter the following code segment into the Apple.py window.
msg = “Hello World”
Right click on the code window and select Run Python File in Terminal.
If all goes well, you should see the following results in the terminal window.
You may see at the bottom of the screen, depending on your setup, a message to select a Python Interpreter
To select an Interpreter, which may happen if you have multiple Python applications installed, while in a document in the explorer, Open the Command Palette (Ctrl+Shift+P) and select the Python: Select Interpreter command palette by starting to type this into the Command Palette.
Select the one you wish to run,
Writing your files to GitHub
Now that we have some changes, lets commit and write the changes back to GitHub.
- Make sure all your files are saved. File / Save All from the tool bar.
- The Source Control (1) icon shows 3 files changed and ready to commit. Enter a note to be saved about the Commit (2) . Once ready, hit CTRL – Enter to commit.
You will get a message that there are no staged changes on your first try, I choose Yes.
- The changes will commit, and the status bar will show that there is a commit that has not been synced. Select the highlighted SYNC button to synchronize your changes to your GitHub repository.
This will bring up a message that you are doing a synchronization from a specific branch, select OK.
- On your first time, it will ask you to log into your GitHub Repository. You will also be asked for your 2 factor authentication if you set that up.
- We should now see our files with the commit message.
Now that we have all the parts installed, we can use Python in VSCode. More tutorials will be added as time goes on. The next tutorial will show how to use Python in data discovery and initial profiling.
Python in Visual Studio Code – https://code.visualstudio.com/docs/languages/python
Python – Visual Studio Marketplace – https://marketplace.visualstudio.com/items?itemName=ms-python.python
Linting Python in Visual Studio Code – https://code.visualstudio.com/docs/python/linting
Get Started Tutorial for Python in Visual Studio Code – https://code.visualstudio.com/docs/python/python-tutorial
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. – https://www.scipy.org/install.html