Introduction to Galaxy
Trainer: Pablo Moreno
Overview: This session includes a lecture and practicals that introduces the Galaxy user interface.
Learning outcomes:
By the end of the session you will be able to:
- Upload files, use Galaxy tools and view histories
- Connect with Galaxy community
Materials:
- Presentation slides
- Recorded lecture:
Further Tutorials: Galaxy in General Transcriptomics
- Screencast for Galaxy practical 1:
- Screencast for Galaxy practical 2:
Questions & Answers
- If you were proficient in scrna seq analysis using Seurat or scanpy, what would be the pros/cons of using Galaxy?
Pablo: reproducibility of your analysis and the ability to discuss it better with your peers; be able to have a persistent research object (data + analysis methods) that can be used for publication purposes; better accountability of your analysis as people can easily inspect it and challenge it (which in turn can help you to improve it for future datasets). Galaxy also takes care of the dependency resolution of tools, otherwise you need to deal with compilation, conda, containers or other dependency resolution mechanisms (this can be a lot of work). Also, Galaxy provides separation of concerns between the workflow logic (what tools we use when) and execution logic (whether you run on a PBS cluster, a Kubernetes cluster, LSF or your own machine). These concerns are normally intermingled in a bash script, where you mix the way you submit to your cluster with the tools that you are submitting — running this elsewhere then becomes very difficult.
2. But do you have the same capacity to personalize the workflow/settings compared to “classic” R/python coding?
Pablo: If you can write R/python coding, you can certainly modify the XML required by Galaxy to wrap a tool, and add anything that the wrapper might not cover (through a PR here for instance). Of course, there will always be additional use cases for a tool that might not be covered by the intentions of the original tool wrapper writer, but if we all as a community work together on them, that coverage increases. Eventually, with mature tool wrappers, the task of making workflows becomes very quick. In contrast, making workflows using, say, a bash script, the cost of building the workflow is always high in the beginning (you need to do the plumbing every time). Maybe doing the 1st pipeline in bash is faster, but when you are doing the 5th one in bash, in Galaxy you’ll be in the 15th pipeline. I would say that in the beginning using Galaxy might be less flexible, but in the long run it pays off.
Generate work flows and apply this same workflow to multiple datasets. Workflows also good for scheduling, no waiting time between steps as it progresses – a bit like a ready made R “script” but with time added.
Going from scanpy to seurat requires that the input files be in the same format.
3. Are the histories automatically saved on my account? Are these backed up on servers?
Pablo: the backup policy will depend on the IT dept running that infrastructure for you and it doesn’t really depend on Galaxy. Galaxy will keep your history, it won’t try to delete it, unless there are admin policies in place. For instance in large public Galaxy instances, histories of users that don’t log in for long periods of time (6 months or a year) and are not shared might be deleted. That depends on each instance administration and it should be made clear in its policies. For certainty, ask the administrators of the Galaxy instance that you are using (but I would doubt that publicly shared objects would be deleted).
4. Is there a space limit for the size of the datasets I have in my history?
Pablo: This again will depend on the Galaxy instance’s administrators for the instance that you are using. Normally there is an overall space quota per user on large instances, but if you require more space and you argue for it, it could be granted (again, you need to talk to whom it runs Galaxy where you are using it). Perse there is no limit of size per single file, but mostly an overall user disk quota.
5. Can anyone have access to the data we upload (when working with unpublished data, what’s the “safety” we have using Galaxy)?
Pablo: If you don’t actively share your data, other users or external people to the instance won’t be able to see it. Administrators could of course access it, but then you are a bit hidden by obfuscation among hundreds or millions of files from tenths or hundreds of users.
6. Does it generate journal quality figures and how easily can these be edited?
Pablo: It will depend on the tool producing the plotting. Some will produce PDF, SVG (high customizable later) or PNG (not so much), with or without all the features that you want.