Working with data

Aims and objectives

This module will:

  • explain what data is
  • examine how data is used
  • explore ways to analyse data.

After completing this module, you will be able to:

  • find, clean and use data
  • evaluate and select tools to analyse and display data.

3. Sources of data

Plan your analysis first

The question you want answered, or the problem you want to solve, will determine the type of data you need and analysis you need to do on that data. You can then use existing data, or gather the data yourself.

Who, what, when, where?

Think about:

  • Who or what  — is the subject of the data? e.g. a group of people, food, animals etc
  • Where  — is the location important? e.g. local or international, a particular suburb
  • When  — what is the appropriate time period to look at?

There may be an existing dataset that you can analyse to answer your question. Otherwise, you can use methods to gather your own data.

Open data

Open data is publicly available for reuse. Open data can be accessed from a range of sources. High quality data can be found on government websites and institutional repositories:

  • Research data  — this guide lists a range of high quality public research data sources
  • Text data — this guide lists a variety of sources for open text data
  • Spatial data  — this guide lists Queensland, Australian and Global spatial data sources
  • UQ eSpace  — the repository for UQ research publications and research datasets
  • Google Dataset search  — allows searching across multiple repositories. Limits you can apply include download format and usage rights.

GovHack is an annual event, where people are invited to apply their creative skills to open government data.

Do you want to use Census data from the Australian Bureau of Statistics (ABS) in your assignment or project?

TableBuilder is a free, online data tool, from the ABS, for creating tables, graphs and maps of Census data. Basic tables (YouTube, 4m44s) shows how to select data sets and create basic tables in TableBuilder:

  

Dataset quality

Evaluate the quality of the dataset, just as you would evaluate any information you find, before you use it in your assignments or projects. Click the plus symbols to find out what you need to consider when assessing datasets:

The metadata or description of the data should include information to help you evaluate the dataset.

Check how the Additional information allows you to easily evaluate the quality of this dataset — Australia's threatened species, life history characteristics, and threatening processes.

Collecting data

You may need to collect your own data to answer your research question, if existing data is not available or suitable.

Storing data

If you collect the data yourself, you will need to think about where to store it.

If you are collecting data for a small project or assignment, using local or online storage, such as Google Drive or OneDrive, would be suitable, as long as you do not have any identifiable, personal data. The Working with files module gives an overview of local and online storage options and how to back up your files.

If you are conducting research as part of your Winter/Summer Research project, Honours, Masters, or Higher Degree by Research degree:

  • See Manage research data to find out how to manage, store and secure your project’s data.
  • Discuss with your supervisor/coordinator the suitability of using the UQ Research Data Manager to manage your research data.

Sample size

Decide how many responses or observations you need to have a good sample size. Larger sample sizes are more likely to allow you to draw accurate conclusions than smaller samples. Get information on sample design from the Australian Bureau of Statistics.

Methods for collecting data

Observation

In observation, processes, activities or behaviours are observed. The subjects being observed may or may not be aware that they are under observation. A description of what occurs, or a checklist looking for a particular event, is used to record the observation.

Find out more about the observational method.

Surveys or polls

A survey is a method of collecting data on behaviour, attitudes and opinions. Plan your survey questions carefully so the information you get from participants is useful to answer your research question.

A poll is a type of survey but usually quite short. Polls often have only one multiple choice question.

Information on planning a survey:

You can conduct your surveys or polls face-to-face or you can use online tools.

Tool Free account available Guides
Google Forms Yes Get started with Google forms
Survey Monkey Yes Help Centre
Crowd Signal Yes Help page
Interviews or focus groups

An interview typically involves asking either structured or unstructured questions of a single participant. Usually, the questions will be open-ended to allow for more in-depth insight on a topic than a survey can give.

A focus group involves a group of selected people (usually 6 to 12 individuals) participating in a group interview, guided by a moderator. It is a good way to get a social context on a topic.

In both techniques, you may need an audio or video record of the discussion, or have an observer record the details.

See some factors to consider when deciding whether to use focus groups or interviews.

Scraping

Scraping is a method of getting text and images from websites and social media. The practice can be problematic, depending upon the amount of reproduction and the intended use. Researchers may use automated methods to access any publicly available information on the web where they are specifically engaged in legitimate research and study that makes use of the data and where they do not further publish the material. Get more infomation or advice on copyright.

The Finding and using media module provides information on how to comply with copyright.

Web APIs

Web APIs can be used to request data from a site using a URL. API stands for Application Programming Interface. Usually, some programming knowledge is needed to use APIs.  Web APIs for non-programmers explains how web APIs work, gives tips on using them and lists some popular free APIs.

More tools for scraping
Tool Freely available Available on UQ Library computers Guides
Python Yes Yes Downloading webpages with Python
Scrapy Yes No Scrapy documentation
Reaper (for social media) Yes No Help information is available on the Reaper site.
NCapture (a Chrome extension used with NVivo) Yes NVivo is available on computers in selected locations- see Additonal software in Zenworks. Find information about NVivo in the Analyse and display data section. What is NCapture? Find out more about NVivo in the next section.

Interested in Instagram data? See a comparison of Instagram scraping tools.

See more methods for gathering social media data in our Text mining & text analysis guide. Text mining is a way of extracting text data from documents, for analysis. See Text mining 101 for a quick overview.

Duration:   Approximately 30 minutes


Graduate attributes

Knowledge and skills you can gain to contribute to your Graduate attributes:

 Independence and creativity

 Critical judgement

 Ethical and social understanding


Check your knowledge

Check what you know about this topic:

Take the quiz

Support at UQ

Access UQ services to assist you with personal or study-related issues.