The content and the structure of this Web site is licensed under the Creative Commons License (CC BY NC ND Lebrand C.- BiUM library-2016) unless otherwise noted.
In order to build on previous findings, improve transparency and increase results reproducibility, it is important for researchers to be able to re-use research data. For all these reasons, the notion of publication has been evolving over the last ten years and today includes not only the results, but also the essential research data needed to validate the results.
Service & Tool
Through the process of data life cycle management, the BiUM publication management service is providing information, advice and help to researchers for publishing and sharing their data. We can provide you with guidance on how to prepare a Data Management Plan and how share your data through journal publications and selected repositories to increase the visibility of your work. We will give you support to find an adapted data repository to meet founding agencies and journal requirements for publishing research data underlying your publication and to ascertain policies on confidentiality and intellectual property. Our unit is well aware of metadata standards for datasets, file formats for long term datasets storage and re-use, data copyright, licenses and self-archiving rules and will help you in addressing these issues. Trainings concerning these aspects are also provided by our service on regular basis (check our calendar).
Use the “VitalIT DMP Canvas Generator tool” adapted for FBM-UNIL/CHUV researchers to make your own DMP template.
Contact Cecile.firstname.lastname@example.org to get access to our model on how to fill the Data Management Plan provided by the SNFS while applying for a funding.
See in other sections the metadata standards used to document datasets and the file formats requested for long term preservation.
- Definition of research data
What are research data?
Research data consist of the original datasets generated by the research. They can be simple or complex, qualitative or quantitative and are generated in various formats.
For example research data include:
- micro-arrays and sequencing data
- statistical analyses
- experimental results
- audio and video recordings
- life sciences imaging
- medical imaging
- computational-based models and simulations
- Data life cycle
Data life cycle
It is important to manage your datasets as soon as your research project is starting, as well as ensuring that your research data can be used, shared and re-used effectively by you and other researchers during the whole data life cycle.
- evaluate your data needs
- build a data management plan (DMP). A DMP is an evolving document reporting how the collected or generated research data will be managed during and after a research project. It should describe: the dataset, standards and metadata, data sharing, archiving and preservation.
- collect and describe your data
- store and secure (backup plan) your data
- process, visualize and analyse your data
- publish and share your data
- preserve and archive your data to re-use them in the future
Modified figure CC-BY Kramer, Bianca; Bosman, Jeroen (2015) and CC-BY Liz Lyon (2015). The BiUM publication management service is providing information and help to researchers at multiple steps of data life cycle management (in green), such as DMP preparation, standards reporting, annotating metadata, formats & standarts files preparation, licencing, publishing, sharing and preserving.
- Source of information
Journal and funding agencies Open Data policies
- What are the advantages for researchers and the scientific community to make datasets freely accessible and reusable?
- What are the journal guidelines concerning reporting standards and data sharing?
- What are the funding agencies policies concerning data sharing?
Research funding agencies, publishers and institutions increasingly require shared standards for open practices in research. The list below includes top funders and journal guidelines.
We can provide you with guidance on how to prepare a Data Management Plan and how share your data through journal publications and selected repositories to meet founding agencies and journal requirements. Trainings concerning these aspects are also provided by our service on regular basis (check our calendar).
- Irreproducibility of published studies in biomedical research
Irreproducibility of published studies in preclinical and clinical research.
Recent studies have shown that worldwide, between 51% to 89% of published preclinical and clinical research is not reproducible, with consequent losses estimated around $100 billions/year in biomedical reerch (Chalmers et al., 2009; Freedman et al., 2015; Begley and Ioannidis, 2015). In particular, these studies have made clear that the research data associated with a publication are fundamental to validate the published analyses and results. Preclinical studies are essential, since they are a potential basis for the discovery of new drugs and therapies, as well as for the use of specific biomarkers in clinical analyses (Begley and Ellis, 2012; Freedman et al., 2015; Begley and Ioannidis, 2015). The low reproducibility rate of research carried out in life sciences is therefore alarming since it causes delays and major costs in therapeutic developments.
This irreproducibility is not only exclusive to preclinical studies but is observed all across the biomedical research spectrum. Indeed, similar problems have been identified for observational research where zero of 52 predictions from observational studies were confirmed in randomized clinical trials (Begley and Ioannidis, 2015).
Many causes contribute to this lack of reproducibility in life science studies. For example, researchers often do not conduct their preclinical experiments in a blind manner and therefore tend to identify the results they anticipated (Howells al., 2014; Begley and Ioannidis, 2015). This confirmation bias in scientific research is inevitable and even the best scientists are inclined to unconsciously find results or interpretations that fit their preconceived ideas and theories. A series of recurring problems have also been highlighted, including the lack of sufficient repetition of the number of experiments, the absence of adequate controls, the lack of reagents validation, lack of transparency and standarts while reporting research results and not using appropriate statistical tests (Begley and Ellis, 2012; Howells al., 2014; Begley and Ioannidis, 2015, Holman et al., 2016). In addition to this, researchers often choose the best experience rather than all, and negative results are rarely published. This absence of standards and best practices has led not only to the lack of reproducibility of individual experiences but also to the fact that the main conclusions of the articles are often not correctly documented.
The irreproducibility of preclinical research is attributed to a lack of both rigor and follow-up of good experimental practices at various stages of the research cycle: i) biological reagents and reference materials, ii) improper preliminary studies design, iii) lack of rigor in data analysis and in reporting research results and iv) random laboratory protocols (Freedman et al., 2015).
Sources of information
- Begley, CG, and Ellis L L. “Drug development: Raise standards for preclinical cancer research” Nature. 2012 Mar 28;483(7391):531-3.
- Begley, C G, and Ioannidis, J. PA. “Reproducibility in science improving the standard for basic and preclinical research.” Circulation research. 2015; 116.1: 116-126.
- Chalmers I, Glasziou P. Avoidable Waste in the Production and Reporting of Research Evidence. Lancet. 2009; 374(9683): 86–89.
- Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLoS Biol. 2015;13(6): e1002165.
- Holman C, Piper S K, Grittner U, Diamantaras A A , Kimmelman J, Siegerink B, and Dirnagl U “Where Have All the Rodents Gone? The Effects of Attrition in Experimental Research on Cancer and Stroke” PLoS Biol. 2016 Jan 4;14(1):e1002331.
- Howells, D. W., Sena E.S., and Macleod, M.R. Bringing rigour to translational medicine. Nat Rev Neurol. 2014 Jan;10(1):37-43.
- Iqbal SA, Wallach JD, Khoury MJ, Schully SD, Ioannidis JPA.”Reproducible Research Practices and Transparency across the Biomedical Literature.” PLoS Biol. 2016. 14(1): e1002333.
- Benefits to data sharing
Benefits to data sharing
Data reuse and citation advantage: Citations are higher for articles that shared data (ex : gene microarray) versus those that do not, independently of the journal IF, the date of publication, and author country of origin (Piwowar H et al. (2007) PLoS ONE).
Point of view: How open science helps researchers succeed (McKiernan et al. (2016) eLife)
- Open data policies from publishers
Policies for open research data from publishers
Too often, publication requirements discourage transparency, openness, and reproducible science. For example, both null and significant results must be made available to ascertain with accuracy the evidence based of a phenomenon. However, as of today, null results are only rarely published and remain inaccessible to knowledge. In addition, to maintain high standards of research reproducibility, and to promote the reuse of new findings, major research data associated with a publication should be made OA according to reporting standards. For this reason, data sharing policies are now often introduced in the instructions for authors by publishers. This requirement is due to the fact that research data are fundamental to validate the analyses and results published in the research article. From this point of view research data are considered as a crucial part of the publication.
In November 2014, The Transparency and Openness Promotion (TOP) Committee met at the Center for Open Science in USA, to address the question of journals’ procedures and policies for publication. The committee comprised researchers, journal editors and funding agency representatives. By developing shared standards for open practices across journals, they wished to change the current research incentive system to drive researchers’ attitude toward more openness. They created eight standards in the TOP guidelines which invite scientific communication to move toward greater openness. The TOP guidelines respect barriers limitation to openness by accepting exceptions to sharing due to ethical issues and intellectual property issues or availability of necessary resources. The guidelines have been published in Science in B. A. Nosek et al. Science 2015;348:1422-1425 and are available at at http://cos.io/top, along with a list of 510 top leading journals and 49 organizations that have already agreed to this guideline (September 14 2015).
Eight standards and three levels of the TOP guidelines provides a summary of the guidelines. “The three levels of the TOP guidelines are increasingly stringent for each standard. Level 0 offers a comparison that does not meet the standard. Two standards (Citation and replication standards) reward researchers for the time and effort they have spent engaging in open practices. Four standards describe what openness means across the scientific process so that research can be reproduced and evaluated. Finally, two standards address the values resulting from preregistration.” Making transparent the distinction between confirmatory and exploratory methods can enhance reproducibility. It is specified that the standard “Design and Analysis Transparency” should maximize transparency about the research process and minimize potential for incomplete reporting of the methodology. This standard is Discipline-specific and the journals should identify if existing reporting guidelines apply (standards available for many research applications from http://www.equator-network.org/) and select the guidelines that are most relevant.
- 02.2015: NIH adopted the Principles and Guidelines for Reporting Preclinical Research
- 11.2014: Nature published Journals unite for reproducibility
- 12.2014 : Science published Journals unite for reproducibility
- 06.2014: the US National Institute of Neurological Disorders and Stroke organized a meeting with major stakeholders in order to discuss how to improve the methodological reporting of animal studies in funding applications and publications. The main workshop recommendation Principles and Guidelines in Reporting Preclinical Research is that at a minimum “studies should report on sample-size estimation, whether and how animals were randomized, whether investigators were blind to the treatment, and the handling of data. Journals should recommend the deposit of data in open access repositories where available and link data bidirectionally to the published paper. Journals should also strongly encourage, as appropriate, that all materials used in the experiment be shared with those who wish to replicate the experiment.”
- 06.2010: PLoS Biology: Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research.“The Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were introduced to help improve reporting standards in 2010. To maximise their utility, the ARRIVE guidelines have been prepared in consultation with scientists, statisticians, journal editors, and research funders. They were published in PLoS Biology and endorsed by funding agencies and publishers and their journals, including PLoS, Nature research journals, and other top-tier journals”.
- DMP and Open data policies from SNSF
Funding agencies (SNSF and H2020) and institutions (UNIL/CHUV) also strongly encourage authors to provide OA to research data, unless there are strong reasons to restrict access, for example in the case of medical or commercial data. Data privacy for sensitive information related to personal and private information needs to be handled carefully, especially in the biomedical field (see our section on confidentiality and intellectual property). Indeed, the divulgation and open-access of sensitive data implies the explicit consent of the individuals as well as privacy protection through data anonymization. In addition, in case of commercial and patenting issues access to research data may have to be restricted and protected.
SNSF policy on DMP and Open Research Data
Research data should be freely accessible to everyone – for scientists as well as for the general public.
The SNSF agrees with this principle and will introduce new requirements in its project funding scheme as of October 2017. Researchers will have to include a data management plan (DMP) in their funding application. At the same time, the SNSF expects that data generated by funded projects will be publicly accessible in non-commercial and FAIR digital databases provided there are no legal, ethical, copyright or other issues.
For more detailed information regarding the implementation of the SNSF policy on Open Research Data (ORD), please refer to this webpage.
We can provide you with guidance on how to prepare a Data Management Plan and how share your data selected repositories to meet SNSF requirements. Trainings concerning these aspects are provided by our service on regular basis (check our calendar).
SNSF guidelines for researchers concerning the Data Management Plans (DMPs)
SNSF policy on Open Research Data
- “Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical).”
Concordat on Open Research Data, published on 28 July 2016
- The SNSF values research data sharing as a fundamental contribution to the impact, transparency and reproducibility of scientific research. In addition to being carefully curated and stored, the SNSF believes research data should be shared as openly as possible.
- The SNSF therefore expects all its funded researchers
- to store the research data they have worked on and produced during the course of their research work, to share these data with other researchers, unless they are bound by legal, ethical, copyright, confidentiality or other clauses, and
to deposit their data and metadata onto existing public repositories in formats that anyone can find, access and reuse without restriction.
- Research data is collected, observed or generated factual material that is commonly accepted in the scientific community as necessary to document and validate research findings.
The regulations related to the SNSF policy on Open Research Data can be found in the Funding Regulations and in the General Implementation Regulations.
SNSF conformed data repositories
Finding the “perfect” repository providing all necessary features to host FAIR data is challenging. To make the transition towards FAIR research data easier, the SNSF decided to fix a set of minimal criteria that repositories have to fulfil to conform with the FAIR data principles.
Costs for granting access to research data (Open Research Data) – Article 28 paragraph 2 letter c of the Funding Regulations:
The costs of enabling access to research data that was collected, observed or generated under an SNSF grant are eligible if the following requirements are met: a. The research data is deposited in recognised scientific, digital data archives (data repositories) that meet the FAIR6 principles and do not serve any commercial purpose. b. the costs are specifically related to the preparation of research data in view of its archiving, and to the archiving itself in data repositories pursuant to letter a
- Open data policies from H2020
Horizon 2020: Open Data Policy
Since January 2017, all researchers submitting a project proposal in the context of Horizon 2020 have automatically been included in the Open Data pilot.
- Open data policies and repositories list
Where to publish your datasets: Data repositories
- Where should researchers working at FBM/CHUV deposit their datasets accompanying their publication?
- Which kind of documents can be self-archived?
- Which document format to use for long-term storage?
- What are the copyright and licence legal aspects for datasets?
- When can Open Access to a dataset underlying an article be provided?
The BiUM publication management unit helps all FBM/CHUV researchers address reporting standards and data sharing policies requirements. Our unit strongly recommends to FBM/CHUV researchers to make their supplementary files and key datasets (life science and medical images, audio-video recordings, blots, ….) accompanying the publication openly available on the appropriate data repository. The preferred way to share large data sets is via public repositories for specific data sets or unstructured public repository like H2020 data repository Zenodo-FBM/CHUV community.
FBM/CHUV researchers who would like to deposit and give free Open Access to the unstructured data underlying their publication via Zenodo-FBM/CHUV community , Dryad or figshare can contact the BiUM publication management unit. We will provide you with guidance on how to share your data through data repository to increase the visibility of your work. Our unit is well aware of metadata standards for datasets, file formats for long term datasets storage and re-use, data copyright, licenses and self-archiving rules and will help you in addressing these issues. Trainings concerning these aspects are also provided by our service on regular basis (check our calendar).
Data privacy for sensitive information related to personal and private information needs to be handled carefully, especially in the biomedical field (see our section on confidentiality and intellectual property). Indeed, the divulgation and open-access of sensitive data implies the explicit consent of the individuals as well as privacy protection through data anonymization. In addition, in case of commercial and patenting issues access to research data may have to be restricted and protected. Our service is not responsible for the proper anonymization of datasets before deposit and request reserchers to consult the services in charge of these aspects at UNIL/CHUV.
Authors may be invited by the publisher to submit key research data that support the figures and tables in long term access formats together with appropriate metadata to data repositories while submitting the article to the journal or after manuscript acceptance. During the review process, researchers can post data files with restricted access and have the ability to share their results only with journal editors and reviewers. Data Repositories will assign a DOI to make research data uniquely citeable. Linking the research dataset directly to the publication by citing it in the reference list will ensure that the dataset is found in the future. Upon publication acceptance, open access and sharing of the data accompanying the publication will be subject to the approval by the depositor of the original file.
For discipline data repositories consult the list provide by PLoS journals or Scientific Data journal. For exemple for genomic, proteomic and metabolomic datasets, authors should use domain-specific public repositories (contact VITAL-IT for further information).
Have a look at our comparison for positive and negative aspects concerning Dryad, Zenodo and figshare.
See also Dataverse comparative review of data repositories
Zenodo is supported by the H2020 programme.
Zenodo repository key caracteristics:
- compatible with the SNSF policy on Open Research Data that expects all its funded researchers to deposit and share their data and metadata onto existing public repositories.
- share research data for free in a wide variety of formats including text, spreadsheets, audio, video, and images across all fields of science
- get credited by making the research data citable (stored data get a DOI to make them easily and uniquely citeable).
- easily access and reuse shared research data
- flexible licensing
- preserve data (research data are are stored safely under the same cloud infrastructure as research data from CERN’s Large Hadron Collider).
- integrate research data into reporting lines for research funded by the European Commission via OpenAIRE.
The BiUM publication management unit strongly recommends to FBM/CHUV researchers to post their supplementary files and key datasets accompanying the publication openly available on the Zenodo-FBM/CHUV community.
Services & tools
For help on documenting your data before depositing it on Zenodo contact the BiUM publication management unit. For help on depositing your data on Zenodo FBM/CHU community, have a look at our Zenodo scenarios.
Deposit a dataset on Zenodo
Déposer un logiciel sur Zenodo depuis GitHub
Dryad is a nonprofit organization that provides long-term access to its contents at no cost to researchers, educators or students, irrespective of nationality or institutional affiliation. Dryad’s Data Publishing Charges (DPCs) are designed to sustain its core functions by recovering the basic costs of curating and preserving data. New innovations are enabled by research and development grants and by support from donors.
Dryad repository key caracteristics:
- Flexible about data format, while encouraging the use and further development of community standards.
- compatible with the SNSF policy on Open Research Data that expects all its funded researchers to deposit and share their data and metadata onto existing public repositories.
- Only CC0 licensing
- Fits into the manuscript submission workflow of its partner journals, making data submission easy.
- Gives journals the option of making data privately available during peer review and of allowing submitters to set limited-term embargoes post-publication.
- Data are linked both to and from the corresponding publication and, where appropriate, to and from select specialized data repositories (e.g. GenBank).
- Assigns data Digital Object Identifiers (DOIs) to data so that researchers can gain professional credit through data citation.
- Promotes data visibility by allowing content to be indexed, searched and retrieved through interfaces designed for both humans and computers.
- Contents are free to download and have no legal barriers to reuse.
- Contents are curated to ensure the validity of the files and metadata.
- Submitters may update datafiles when corrections or additions are desired, without overwriting the original version linked from the article.
- Long-term preservation … by migrating common file formats when older versions become obsolete, and partnering with DataONE to guarantee access to its contents indefinitely.
Costs to deposit data for individual researchers:
The base DPC per data package is $120. DPCs are collected upon data publication. The submitter is asked to commit to the charge at the time of submission, and is charged if the accompanying publication is accepted, unless the associated journal, has already contracted with Dryad to sponsor the DPC.
To determine whether your DPC will be covered, look up your journal. If there is no payment plan in place, the DPC is $120.
Services & tool
For help on documenting your data before depositing it on Dryad contact the BiUM publication management unit for more information.
figshare allows users to upload easily any file format to be made visualisable in the browser so that figures, datasets, media, papers, posters, presentations and filesets can be disseminated in a way that the current scholarly publishing model does not allow. It allows you to manage your research in the cloud and control who you share it with or make it publicly available and citable.
figshare si supported by the commercial Mc Millan-NPG group and therefore is not compatible with the SNSF policy on Open Research Data that expects all its funded researchers to deposit and share their data and metadata onto existing public repositories.
Figshare repository key caracteristics:
- Make your data more discoverable and open to all your readers
- Secure hosting and visualization in the browser of all file types
- Authors can easily upload files with no concerns about file size or format
- All data is citable and has a DOI
- Manage and measure the impact of your digital files
- Become the solution for your authors to satisfy funder data mandates
- Host files from 2 – 200Gb
- 20 GB of free private space
- Unlimited public space
- Provide your readers with the full story, allowing them access to the data behind the figures
- Publish your data set using CC BY licence
- Persistent hosting of all data, guaranteed.
- Easy integration with existing author submission processes
For help on depositing your data on figshare, have a look at our figshare scenario.
To ensure long-term access and re-use of your data by others, the BiUM publication management unit can help you describe precisely your datasets. Practical courses about these aspects are also provided by our service on a regular basis (check the CHUV calendar).
Metadata (data documentation) are absolutely necessary for a complete understanding of the research data content and to allow other researchers to find and re-use your data.
Many metadata standards are available for particular file formats and disciplines. The BiUM recommends using the Dublin Core Metadata Element Set for describing publications and DataCite Metadata Schema for describing general research data based on European recommendation.
Metadata should be as complete as possible, using the standards and conventions of a discipline, and should be machine readable. Metadata should always accompany a dataset, no matter where it is stored.
For help on documenting your data before depositing it on data repository, have a look at DataCite Metadata Schema.
In addition, DataCite Metadata Schema for Publication and Citation of Research Data can be used to generate a Readme XML file describing your datasets. The DataCite Metadata Schema for Publication and Citation of Research Data allow data to be understood and reused by other members of the research group and add contextual value to the datasets for future publishing and data sharing. We will generate the Readme XML file automatically using the DataCite Metadata Generator after filing the form requesting intrinsic metadata. The Readme XML file ensures compatibility with international standards and is human as well as machine-readable.
- Mandatory elements will include the file name for the results (field Title)/creators name (field Creator)/affiliation (field creator affiliation)/type of data (field Resource Type).
- Recommended elements will include key words (field Subject)/date of data creation (field Date)/link to electronic notebook (field Related Identifier)/details on the methodology used, analytical and procedural information, definitions of variables, vocabularies and units of measurement (field Description).
- Optional elements will include information on the size / format / version / access /funding.
Documentation, metadata, citation” tutoriel en ligne, Mantra.
- Research data need to be documented at various levels:
- Project level: what the study set out to do, how it contributes new knowledge to the field, what the research questions/hypotheses were, what methodologies were used, what sampling frames were used, what instruments and measures were used, etc. A complete academic thesis normally contains this information in detail, but a published article may not. If a dataset is shared, a detailed technical report will need to be included for the user to understand how the data were collected and processed. You should also provide a sample bibliographic citation to indicate how you would like secondary users of your data to cite it in any publications, etc.
- File or database level: how all the files (or tables in a database) that make up the dataset relate to each other; what format they are in; whether they supercede or are superceded by previous files. A readme.txt file is the classic way of accounting for all the files and folders in a project.
- Variable or item level: the key to understanding research results is knowing exactly how an object of analysis came about. Not just, for example, a variable name at the top of a spreadsheet file, but the full label explaining the meaning of that variable in terms of how it was operationalised.
- Some examples of data documentation are:
- laboratory notebooks & experimental protocols
- questionnaires, codebooks, data dictionaries
- software syntax and output files
- information about equipment settings & instrument calibration
- database schema
- methodology reports
- provenance information about sources of derived or digitised data
- The term metadata is commonly defined as “data about data“, information that describes or contextualises the data.
- The difference between documentation and metadata is that the first is meant to be read by humans and the second implies computer-processing (though metadata may also be human-readable).
- Documentation is sometimes considered a form of metadata, because it is information about data, and when it is very structured it can be. The importance of metadata lies in the potential for machine-to-machine interoperability, providing the user with added functionality, or ‘actionable’ information.
Readme file for data sharing “What is a README file, and how do I make mine as useful as possible?”
A README file is intended to help ensure that your data can be correctly interpreted and reanalyzed by others.
There are two ways to include a README with your Dryad data submission:
- Provide a separate README for each individual data file (view an example).
- Submit one README for the data package as a whole (view an example).
Dryad recommend that a README be a plain text file containing the following:
- for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication
- for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units
- any data processing steps, especially if not described in the publication, that may affect interpretation of results
- a description of what associated datasets are stored elsewhere, if applicable
- whom to contact with questions
- If text formatting is important for your README, PDF format is also acceptable.
The DataCite Metadata Schema for Publication and Citation of Research Data distinguishes between 3 different levels of obligation for the metadata properties:
- Mandatory (M) properties must be provided,
- Recommended (R) properties are optional, but strongly recommended and
- Optional (O) properties are optional and provide richer description.
Table 1 and table 2 list the different items you should document about your dataset based on the 3 different levels of obligation. For more details read the entire document provided by DataCite.
Table 1: DataCite Mandatory Properties
Table 2: DataCite Recommended and Optional Properties
Many academic disciplines have formalized specific metadata standards.
You can consult them on:
File formats for long-term preservation and re-use
To ensure long-term access and usability of your data, the BiUM publication management unit encourages you to deposit documents with the standard preservation file formats most likely to be accessible in the future. We can provide you with guidance on which format to use for long-term preservation of your data. Practical courses concerning these aspects are also provided by our service on a regular basis (check the CHUV calendar).
As technology evolves, it is important to consider which file formats you will use for preserving files in the long run.
File formats most likely to be accessible in the future have the following characteristics:
- Open, documented standard
- Popular format
- Standard representation
For help on long-term preservation standards format have a look at FBM Recommended Files format.
Citation for a dataset
Data should be considered legitimate, citable products of research and be given the same importance in the scholarly record as citations of other research objects, such as publications (see Joint Declaration of Data Citation Principles).
Proper citation will help making research data easily accessible and re-usable, while providing researchers due credit for their work. Indeed, since the citation contains the name of the creator, it permits the author to get proper credit. Moreover, the impact of the research dataset can easly be tracked by the unique DOI.
The data citation should be included in the reference list of the article.
- Minimum recommended format
The minimum recommended format for research data citation is as follows:
- Creator (PublicationYear): Title. Publisher. Identifier
Where Publisher is the data archive that holds the data and Identifier is displayed as linkable, permanent URLs.
« Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855 »
Citation for subject archive entry:
Genbank accession number, available at: http://www.ncbi.nlm.nih.gov.
Data citation templates while using citation software:
- In Endnote use the reference type for dataset.
- In Mendeley or Zotero, use another generic reference type template and fill it with the information for your dataset.
- Tool DataCite website
Guide « How to Cite Datasets and Link to Publications »
« This guide is very helpful to create links between your academic publications and the underlying datasets, so that anyone viewing the publication will be able to locate the dataset and vice versa. It provides a working knowledge of the issues and challenges involved, and of how current approaches seek to address them. This guide should interest researchers and principal investigators working on data-led research, as well as the data repositories with which they work“.
Research data confidentiality
FBM/CHUV researchers conducting research on human subjects should consult the Commission cantonale d’éthique de la recherche sur l’être humain before planning research data use and sharing.
In case of concern related to clinical trials issues consult the CRC (Centre de Recherche clinique).
|Directeur médical du CRC|
|Prof. Marc Froissart|
Tél. +41 (0)21 314 61 84
Intellectual property for datasets
When talking about databases, we first need to distinguish between the structure and the content of a database. The structural elements of a database involving originality will generally be covered by copyright. Concerning the content, individual content items are not copyrightable, while in most juridictions, data collection involving creativity can be copyrightable.
Ask us about the use of Open licence tools to make your document freely accessible while protecting your copyrights.
Open licenses for data
Promoting sharing and unlimited use of the data that you have produced yourself is best achieved using explicit licences. For open data it is recommended that you use one of the open compliant licenses marked as suitable for data.
The CC BY license lets others distribute, remix, tweak, and build upon a work, even commercially, as long as they credit the author for the original creation.
Sources of information
EU database directive.
The EU directive provides for both copyright and the sui-generis right though with some restrictions on when you can use the copyright :
« (i) Copyright in the Compilation. … First, it [the DB directive] defines what is meant by a “database”: “a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means.” [DB Dir Art 3] Then it allows copyright in a database (as distinct from its contents), but only on the basis of authorship involving personal intellectual creativity. This is a new limitation, so far as common law countries are concerned, and one which must presage a raising of the standard or originality throughout British Copyright law. Intellectual judgment which is in some sense the author’s own must go either into choosing contents or into the method of arrangement. The selective dictionary will doubtless be a clearer case than the classificatory telephone directory but each may have some hope; the merely comprehensive will be precluded – that is the silliness of the whole construct. »….
« (ii) Database right. In addition there is a separate sui generis right given to the maker of a database (the investing initiator) against extraction or reutilisation of the database. Four essential points may be highlighted:
1. The right applies to databases whether or not their arrangement justifies copyright and whatever position may be regarding copyright in individual items in its contents.
2. The focus upon contents, rather than organisational structure, is intended to give a right where the contents have been wholly or substantially taken out and re-arranged (generally by a computer) so as to provide a quite different organisation to essentially the same material – a re-organisation which would not necessarily amount to infringement of copyright in the original arrangement. …
3. The database has to be the produce of substantial investment. …
4. The right lasts for 15 years from completion of the database, or 15 years from its becoming available to the public during initial period. However, further substantial investment in additions, deletions or alterations starts time running afresh. … »
Swiss and UNIL/CHUV directives
Publish your dataset in data journals
Data papers are articles describing a set of data. This new type of papers are designed to make your data more discoverable, interpretable and reusable. Importantly, these papers are found using well developped classical literatture search databases.
- Scientific Data
Scientific Data is a peer-reviewed open access scientific journal published by the Nature Publishing Group since 2014. Its article-type, the Data Descriptor focuses on descriptions of datasets relevant to the natural sciences. The journal is abstracted and indexed by PubMed.
GigaScience is an online open-access open-data journal wishing to revolutionize data dissemination, organization, understanding and use. The journal publishes ‘big-data’ studies from the entire spectrum of life and biomedical sciences. To achieve its goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database that hosts all associated data and provides data analysis tools and cloud-computing resources.
For more information, see the following list of data journals.
Find datasets on Open Data repositories
- How to find a dataset associated with a published article?
The BiUM publication management unit helps all FBM/CHUV researchers finding a repository and datasets. We will give you support to find an adapted repository to deposit your published articles and data underpinning the publication.
Several aggregator tools have been selected by the BiUM to help researchers finding Open Access repositories where they can retrieve articles and datasets.
Datasets are currently dispersed all around the world on a multitude of repositories. Some authoritative directories such as re3data.org, OAD, OpenAIRE and OpenDOAR help exploring more than 2600 indexed data repositories using search engine for repository contents. Researchers looking for published data related to their research question need to visit each repository individually and manually search for data on each of them.
biosharing is a curated, informative and educational resource on inter-related data standards, databases, and policies in the life, environmental and biomedical sciences
re3data (registrery of research data repositories)
re3data.org website is a directory of more than 1,200 indexed and discoverable data repositories. The recent fusion between re3data service with datacite which provides and assigns permanent Digital Object Identifiers (DOIs) for datasets, will help exploring data repositories and making publications and data more accessible.
OpenDOAR (Directory of Open Access Repositories)
OpenDOAR is an authoritative directory of academic OA repositories developed by SHERPA (Securing a Hybrid Environment for Research Preservation and Access). This service provides a search for repositories or repository contents.
ROARMAP (Registry of Open Access Repository Mandates and Policie)
ROARMAP is a searchable international registry charting the growth of open access mandates and policies adopted by universities, research institutions and research funders that require or request their researchers to provide open access to their peer-reviewed research article output by depositing it in an open access repository.
OAD (Open Access Directory)
OAD is a list of repositories and databases for open data.
Data management services at UNIL/CHUV
Other services have been launched at UNIL and CHUV to help researchers manage active research data during the collection, storage, analysis and preservation and archive of data:
- The Archive Service of UNIL
UNIRIS (The Archive Service of UNIL)
In order to provide UNIL researchers with a framework for the effective management of their active data, this service develops a five-year roadmap for the management of research data in collaboration with UNIL faculties and internal experts. The service is already providing resources (good practice guides, DMP templates, etc.) via the web, organize trainings (workshops, etc.) for researchers and conference on data management at UNIL. For more information, feel free to contact the service at the following address: https://uniris.unil.ch/researchdata/#contact
Vital-IT is a Competency Centre in Bioinformatics and Computational Biology that provides infrastructure, support and technological R&D for life science and clinical research in Switzerland and internationally. is a plateform helping Swiss researchers for the management, storage, analyses, and publication of genomic, proteomic and metabolomic big datasets. For more information, feel free to contact the service at the following address: http://www.vital-it.ch/about#contact
- BioInformatics Competence center
BioInformatics Competence Center
The center has the expertise to offer customized data analysis, that go beyond standard protocols, in order to respond to the request of scientists. Support is available for the following Genomics, Proteomics, FACS and CyTOF, Data processing, Web, Image analysis.
- Service de soutien à la recherche clinique
Centre de Recherche Clinique de Lausanne (CRC). FBM-UNIL/CHUV researchers conducting clinical study research should consult the CRC as earliest as possible when planning a prospective clinical study either interventional (trial) or observational. The CRC can provide them with services spanning from concept/design to publication, including solutions for electronic data capture, data management and statistical analysis.For more information, feel free to contact the service.
- Unité de valorisation des données et des échantillons biologiques (VDE)
Unité de valorisation des données et des échantillons biologiques (VDE). FBM/CHUV researchers conducting research on human subjects and using samples from the Biobanque Institutionnelle de Lausanne should consult VDE before planning research data use especially to make sure that data are codified and correctly de-identified. For more information, feel free to contact the service
PACTT (Powering Academia-industry Collaborations and Technology Transfer). PACTT is the joint technology transfer office of the University of Lausanne (UNIL) and the University Hospital of Lausanne (CHUV). Contact us for commercialisation of research results, protection and management of intellectual property, negotiation and management of collaboration contracts with industry and other institutions, or if you need advice with the creation of a start-up company. For more information, feel free to contact the service at the following address: email@example.com
- CHUV/UNIL IT services
The IT services of UNIL, FBM and CHUV ensure the storage, back up and preservation of FBM/CHUV data.
- Data Life Cycle Management
Data Life Cycle Management (DLCM) the Swiss DLCM project provides Swiss researchers, information professionals and other people interested in Research Data Management with good practices, practical resources as well as news in regard to this topic.
The BiUM publication management unit works in close collaboration with UNIL services UNIRIS, PACTT, FBM/CHUV services uDDSP, CRC, VDE, Swiss universities services Vital-IT_SIB and DLCM and IT services of UNIL and CHUV to help you manage your research data.