US SOC and ONET Occupational Codes Dataset

Please subscribe to the product from AWS Marketplace

Overview

Combined and curated repository of all US Occupational Codes in both SOC and ONET formats with Title, Description, Task Details (Task Details are available only for ONET codes), Wages, Skills and Tools information.

SOC stands for Standard Occupational Classification. It is a federal statistical standard used by federal agencies to classify workers into occupational categories. SOC Occupational codes are used to collect, calculate, or distribute data around Occupations. The SOC system is organized using codes, which generally consist of six numerical digits. For example, the SOC code for a stonemason is 47-2022. The first two digits, “47” represent the major group, which includes all construction and extraction occupations.

O*NET is the United States’ primary source to find occupational data. The Occupational Information Network (O*NET) is a system based on the Standard Occupational Classification (SOC). The first six digits of occupational codes in the O*NET system match the SOC code of coordinating occupations. The main difference is that the O*NET system uses an extra two digits to break occupations down into more specialized categories.

The complete dataset is organized in two parts , primary and supplementary and will be updated monthly and made available in three formats: CSV, JSON (Nested and Single line), and XML.

Primary Dataset

Primary dataset comprises both SOC and ONET codes and will provide the Code, Type, Title and Description of the code.  It has 2,245 rows of data. All formats provide 4 data points:

Code: Occupational code in either SOC or ONET formats.

Type: Informs whether the code is SOC or ONET. This field has only two possible values.

Title: Provides the Title of the Occupational Code.

Description: The Detailed Description of the Occupational Code.

The Nested JSON will allow for direct querying on the Code and will provide the other fields in response.

Supplementary Dataset

Supplementary dataset comprises of only the ONET codes and their Task details, and will provide the Code, Title, Task ID, Task Details and Task Type – whether core or supplementary to the occupation. Its important to note that one Job code can have multiple core and supplementary Task IDs assigned to it. The supplementary dataset has 19,281 rows of data. The Nested JSON will allow for direct querying on the Code and will provide the task details in response and note whether the tasks are core or supplementary. The following five data points are provided:

Code: Occupational code in ONET format

Title: Provides the Title of the ONET Code

Task-ID: Provides Task ID

Task: Provides Task Details

Task-Type: Information if the Task is Core or Supplementary.

Prize Dataset

The prize dataset is a complimentary collection of various US macro data aggregated per SOC or ONET code in Excel or CSV format. The data shared in this dataset is refreshed frequently and older data may be purged in new dataset revisions. Some examples of data in the prize dataset include:

Mean and median annual wage data in US per SOC Occupational Code (currently May 2022)

Emerging new Tasks for ONET Occupational Codes (additional to supplementary dataset)

Technology skills needed or nice to have per ONET Occupational Code

Equipment used per ONET Occupational Code

For product support please use the form below.

Please enable JavaScript in your browser to complete this form.
Customer Type