Stata

This article is part of the Stata for Students series. If you are new to Stata we strongly recommend reading all the articles in the Stata Basics section.

  • Stata is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including economics, sociology, political science, biomedicine, and epidemiology.
  • Stata was created in 1985 by StataCorp. It has developed some statistical software that gives you all you need for data science and inference data. Also, their services are fast, accurate and you can use the information without any doubts. Furthermore, Stata’s new product is called Stata 16 and it has features ranging from lasso to Python.

Stata tries very hard to make all its commands work the same way. Spending a little time learning the syntax itself will make it much easier to use commands later.

To carry out the examples in this section, you'll need to have created an SFS folder and downloaded the gss_sample data set as described in Managing Stata Files. Create a new do file in that folder called syntax.do, as described in Doing Your Work Using Do Files. To start with it should contain:

capture log close
log using syntax.log, replace
clear all
set more off
use gss_sample
// work will go here
log close

Stata refers to any graph which has a Y variable and an X variable as a twoway graph, so click Graphics, Twoway graph. The next step is to define a plot. In Stata terms, a plot is some specific data visualized in a specific way, for example 'a scatter plot of mpg on weight.' A graph is an entire image, including axes, titles, legends, etc.

The example commands will go after use gss_sample and before log close. Add the example commands to this do file as you go, and run it frequently to see the results.

Commands

Most Stata commands are verbs. They tell Stata to do something: summarize, tabulate, regress, etc. Normally the command itself comes first and then you tell Stata the details of what you want it to do after.

Many commands can be abbreviated: sum instead of summarize, tab instead of tabulate, reg instead of regress. Commands that can destroy data, like replace, cannot be abbreviated.

Variable Lists

A list of variables after a command tells the command which variables to act on. First try sum (summarize) all by itself, and then followed by age:

sum
sum age

If you don't specify which variables sum should act on it will give you summary statistics for all the variables in the data set. In this case that's a pretty long list. Putting age after sum tells it to only give you summary statistics for the age variable.

If you list more than one variable, the command will act on all of them:

sum age yearsjob prestg10

This gives you summary statistics for age, years on the job, and a rating of the respondent's job's prestige.

If Conditions

An if condition tell a command which observations it should act on. It will only act on those observations where the condition is true. This allows you to do things with subsets of the data. An if condition comes after a variable list:

sum yearsjob if sex1

This gives you summary statistics for years on the job for just the male respondents (in the GSS 1 is male and 2 is female).

Note the two equals signs! In Stata you use one equals sign when you're setting something equal to something else (see Creating Variables) and two equals signs when you're asking if two things are equal. Other operators you can use are:

Equal
>Greater than
<Less than
>=Greater than or equal to
<=Less than or equal to
!=Not equals

! all by itself means 'not' and reverses whatever condition follows it.

Combining Conditions

You can combine conditions with & (logical and) or | (logical or). The character used for logical or is called the 'pipe' character and you type it by pressing Shift-Backslash, the key right above Enter. Try:

sum yearsjob if sex1 & income>=9
sum yearsjob if sex1 | income>=9

The first gives you summary statistics for years on the job for respondents who are male and have a household income of $10,000 or more. The second gives you summary statistics for years on the job for respondents who are male or have a household income of $10,000 or more, a very different group.

Any conditions you combine must be complete. If you want summary statistics for years on the job for respondents who are either black (race2) or 'other' (race3) you can not use:

sum yearsjob if race2 | 3 // don't do this

(What this does and why is left as an exercise for the reader, but it's not what you want.) Instead you should use:

sum yearsjob if race2 | race3 // do this instead

Missing Values

If you have missing values in your data, you need to keep them in mind when writing if conditions. Recall that the generic missing value (.) acts like positive infinity, and the extended missing values (.a, .b, etc.) are even bigger. So if you type:

sum yearsjob if age>65

you are not just getting summary statistics for years on the job for respondents who are older than 65. Anyone with a missing value for age is also included. Assuming you're interested in people who are known to be older than 65, you should exclude the people with missing values for age with a second condition:

sum yearsjob if age>65 & age<.

It makes a difference!

Why age<. rather than age!=.? For the age variable, the GSS uses .c for missing and age!=. would not exclude .c. Other variables use different extended missing values, and some use more than one. Using age<. guarantees you're excluding all missing values, even if you don't know ahead of time which ones the data set uses.

Binary Variables

If you have a binary variable coded as 0 or 1, you can take advantage of the fact that to Stata 1 is true and 0 is false. Imagine that instead of a variable called sex coded 1/2, you had a variable called female coded 0/1. Then you could do things like:

sum yearsjob if female
sum yearsjob if !female // meaning 'not female'

Just one thing to be careful of: to Stata everything except 0 is true, including missing. If female had missing values you would need to use:

sum yearsjob if female & female<. // exclude missing values

or:

sum yearsjob if female1 // automatically excludes missing values

Unfortunately the GSS does not code its binary variables 0/1 so you can't actually run these four commands. But many data sets data sets do, and if you have to create your own binary variables you can make them easy to use by coding them 0/1.

Options

Options change how a command works. They go after any variable list or if condition, following a comma. The comma means 'everything after this is options' so you only type one comma no matter how many options you're using.

The detail option tells summarize to calculate percentiles (including the 50th percentile, or median) and some additional moments.

sum yearsjob, detail

Many options can be abbreviated like commands can be—in this case just d would do.

Some options require additional information, like the name of a variable or a number. Any additional information an option needs goes in parentheses directly after the option itself.

Stata software

Recall that when we did sum all by itself and it gave us summary statistics for all the variables, it put a separator line after every five variables. You can change that with the separator (or just sep) option:

Stata Version 14 Free Download

sum, sep(10)

The (10) in parentheses tells the separator option to put a separator between every ten variables. You'll learn more useful options that need additional information in the articles on statistical commands.

By

By allows you to execute a command separately for subgroups within your data. Try:

bysort sex: sum yearsjob

This gives you summary statistics for years on the job for both males and females, calculated separately.

By is a prefix, so it comes before the command itself. It's followed by the variable (or variables) that identifies the subgroups of interest, then a colon. The data must be sorted for by to work, so bysort is a shortcut that first sorts the data and then executes the by command. Now that the data set is sorted by sex, you can just use by in subsequent commands:

by sex: sum prestg10

Complete Do File

The following is a do file containing all the example commands in this section:

Statamic

capture log close
log using syntax.log, replace
clear all
set more off
use gss_sample
sum
sum age
sum age yearsjob prestg10
sum yearsjob if sex1
sum yearsjob if sex1 & income>=9
sum yearsjob if sex1 | income>=9
sum yearsjob if race2 | 3 // don't do this
sum yearsjob if race2 | race3 // do this instead
sum yearsjob if age>65
sum yearsjob if age>65 & age<. // exclude missing values
/* Things you could do if you had female coded 0/1
instead of sex coded 1/2:
sum yearsjob if female
sum yearsjob if !female // meaning 'not female'
sum yearsjob if female & female<. // exclude missing values
sum yearsjob if female1 // automatically excludes missing values
*/
sum yearsjob, detail
sum, sep(10)
bysort sex: sum yearsjob
by sex: sum prestg10
log close

Last Revised: 6/24/2016

A data analytics tool for researchers

Stata is a paid data analysis and statistical software designed for Windows. Developed by StataCorp LLC, the application offers users a range of features for forecasting, managing files, analyzing data, visualizing, storing information, discovering, and tracking. It also offers various tutorials, documentation, and webinars to help users learn its many functions.

With Stata for Windows, users can research and specialize in their respective fields as the tool offers them accurate and intuitive data analysis and statistics features. The app is extensively used in business, policy creation, education, medical science, economics, and more. A few Stata alternatives that can also help researchers include Power BI Desktop, Statcounter Web Analytics, and GNS3.

What is Stata used for?

As mentioned, Stata is a data analysis and statistical tool that helps researchers and students capture, understand, and present data to an audience. Once launched, the application lets you use various features so that you’re able to make discoveries and observations and record any insights that you have. This comprehensive data analysis software also helps in creating graphs and other visualization models that you can print and publish at your convenience.

What are the features of Stata?

One of the best parts about Stata download for Windows is that it provides users with several standard and advanced methods that make data analysis easier. For starters, the application has a smart user interface that lets users send commands and see the data become comprehensible in real-time. It also lets them compare changes and perform different experiments without much hassle.

In addition to this, users who handle a large volume of data can easily add and modify several components like names, types, labels, and more, quickly and effortlessly. Since Stata softwarealso lets programmers and developers write commands and execute them, it’s worth exploring.

All the commands that you write and perform get recorded so that you can easily share your research and conduct analyseswith superiors, teachers, and colleagues. All information can be easily downloaded so that you can send an email whenever you want.

The recorded data gets stored in log files, such that you can see all the changes made and the results on a spreadsheet. With its help, you can review your work, tweak or run commands again, and ensure that the data that you have is accurate. This feature also helps in replicating data, statistics, and analysis if needed.

Another great tool included in Stata download for Windows is that it helps users in creating graphs, models, and other visualizations. For that to happen, you must write a command and run it to generate graphs that are even suitable for publication and printing purposes. You can also edit these visualizations and save them in different formats like JPEG, PNG, EPS, and TIF files.

How to get started with Stata?

Once you complete the Stata download, the installer will initiate the installation process. Users should note that no other products will get downloaded along with this statistical software. After the installation is complete, users need to sign up to the platform and create a username and password. The program asks you to provide a few additional details before creating your account. At this point, you will also need to provide the license key that you received when you purchased the application.

Is Stata difficult to learn?

Stata is a comprehensive software that performs various complex tasks such as data analysis and statistical analysis. However, the application is anything but difficult. In fact, the app has an intuitive interface that changes to adapt to users. If you’ve never worked with Stata before, the application will provide you with tips and tutorials so that you can learn to utilize it better. Similarly, Stata for Windows also adapts to the requirements of researchers, skilled users, and developers.

Our take

As a statistical and data analysis software, Stata is perfect! It has an intuitive interface that’s suitable for both novice users and skilled researchers. For anyone struggling, the app provides several documents, tutorials, and webinars. When it comes to features, Stata stands out as it offers analytics, forecasting, visualization, discovery, associations, storage, and more. It also lets users create graphs, write and record commands, and re-run the analysis. It also provides several standard and advanced analyses and statistics features for comprehensive data management.

Should you download it?

If you’re looking for accurate, fast, and user-friendly data analysis software, download Stata. The application has a command-line interface that lets users easily record, understand, and share data analysis and statistics. It also allows users to see changes made to the data in real-time and visualize information for better understanding.

Highs

  • Suitable for beginners and skilled users
  • Option to record logs and re-run commands
  • Has an intuitive interface
  • Provides tutorials and documentation

Statafor Windows

Stata Center

17