CucurQTL Documentation

Welcome to the comprehensive documentation for CucurQTL - a specialized database for Quantitative Trait Loci (QTL) data across cucurbit species. This documentation will guide you through all features and functionalities of the database.

Quick Start: Navigate to QTL Search to start exploring QTL data, or visit Genomes to download reference sequences.

1.1 What are QTLs?

A Quantitative Trait Locus (QTL) is a region of DNA associated with a particular phenotypic trait that varies in degree. Unlike simple Mendelian traits controlled by single genes, quantitative traits (like fruit size, yield, or disease resistance) are influenced by multiple genes and environmental factors.

QTL analysis helps researchers:

  • Identify genomic regions controlling important agricultural traits
  • Understand the genetic architecture of complex traits
  • Develop markers for marker-assisted selection (MAS)
  • Accelerate crop breeding programs

1.2 Database Purpose

CucurQTL serves as a comprehensive repository for QTL data across multiple cucurbit species, including cucumber, watermelon, melon, pumpkin, and other economically important gourds. The database aims to:

📊 Centralize Data

Aggregate QTL information from diverse genetic studies into a single, searchable resource

🔬 Standardize Format

Provide consistent formatting for QTL data across species and studies

🧬 Enable Research

Facilitate comparative genomics and support breeding applications

📥 Share Resources

Provide downloadable genome sequences and annotation files

1.3 Key Features

  • QTL Search: Filter and search QTLs by species, trait, sub-trait, and parameters
  • JBrowse2 Integration: Interactive genome browser for visualizing genomic features
  • Functional Annotation: Search gene annotations within specific genomic regions
  • Genome Downloads: Access reference genomes, GFF annotations, and protein sequences
  • Data Export: Download search results in CSV and Excel formats
  • Contributor System: Submit QTL data for review and approval by administrators
  • Admin Dashboard: Comprehensive admin panel for data management and user approvals

2. Application Features

The QTL search interface provides powerful filtering capabilities to find relevant QTL data:

Search Filters

  • Species: Select from 10 cucurbit species (Cucumber, Watermelon, Melon, etc.)
  • Trait Category: Major trait categories (Fruit Quality, Disease Resistance, Vegetative, etc.)
  • Sub-trait: Specific traits within categories (Fruit Length, Powdery Mildew Resistance)
  • Parameter: Measured parameters (Brix content, Length in cm, etc.)

Results Display

Search results include:

  • QTL name and associated linkage group
  • Position interval on the chromosome
  • LOD score and phenotypic variance explained (PVE/R²)
  • Associated markers and mapping method
  • Reference publication with DOI links

Tip: Use the "Download" button to export your search results as a CSV file for further analysis.

2.2 JBrowse2 Genome Browser

CucurQTL integrates JBrowse2, a modern genome browser that allows interactive visualization of genomic data:

Available Tracks

  • Reference Sequence: View nucleotide sequences at any zoom level
  • Gene Annotations: Browse gene models, exons, and UTRs
  • GFF Tracks: Visualize genomic features from annotation files

Navigation Features

  • Search by gene name or genomic coordinates
  • Zoom in/out with mouse scroll or controls
  • Pan by clicking and dragging
  • Switch between species assemblies

The Functional Annotation tool allows you to search for genes and genomic features within specific regions:

Search Parameters

  • Species: Select the species of interest
  • Chromosome: Specify the chromosome number
  • Start Position: Beginning of the genomic region (bp)
  • End Position: End of the genomic region (bp)

Available Annotation Fields

Category Fields
Basic Info Sequence ID, Feature type, Gene name, Symbol
Position Chromosome, Start position, End position, Strand
Gene Ontology GO IDs, GO names, GO Cellular Component, GO Molecular Function
Functional Enzyme codes, KEGG pathways, InterPro IDs
BLAST Results Hit description, E-value, Similarity, Bit score

2.4 Reference Genome Resources

Download reference genomes and annotation files for all 10 cucurbit species:

Available Downloads

  • Genome FASTA: Complete reference genome sequences (.fa.gz)
  • GFF Annotations: Gene and feature annotations (.gff3.gz)
  • Protein Sequences: Predicted protein sequences (.fa.gz)

Note: All genome files are compressed with gzip. Use tools like gunzip or 7-Zip to decompress.

2.5 Contributor System

CucurQTL features a comprehensive data submission system allowing researchers to contribute QTL data:

For Contributors

  • Registration: Create an account with your institutional details and ORCID ID
  • CSV Upload: Submit QTL data in standardized CSV format
  • Upload Tracking: Monitor the status of your submissions
  • Email Notifications: Receive updates when your data is approved

CSV Format Requirements

Upload files must include these columns:

Species, Trait, Sub-Trait, Parameter, Cross, Population type,
Method/Model, Qtl Name, Linkage Group (LG), Position/interval (cM/Mb),
Associated Marker, LOD, PVE/R2, Reference, Doi

Tip: Required fields are Species, Trait, and Qtl Name. Authors and publication year are auto-extracted from the Reference column.

Admin Review Process

  • New registrations require admin approval before access is granted
  • Uploaded data is staged for review before being added to the main database
  • Admins can approve, reject, or request revisions
  • Approved data is automatically imported with proper foreign key relationships

3. User Guide

3.1 Searching QTLs

  1. Navigate to QTL from the main navigation menu
  2. Use the cascading dropdown filters:
    • First, select a Species (e.g., Cucumber)
    • Choose a Trait Category (e.g., Fruit Quality)
    • Select a Sub-trait (e.g., Fruit Length)
    • Optionally filter by Parameter
  3. Click "Search" to retrieve matching QTLs
  4. Browse results in the interactive data table
  5. Click "Download CSV" or "Download Excel" to export results

3.2 Using JBrowse2 Genome Browser

  1. Navigate to Tools → JBrowse2 from the menu
  2. Select a species/assembly from the dropdown
  3. Enter a genomic location in the search box (e.g., Chr1:1000000-2000000)
  4. Use the track selector to enable/disable annotation tracks
  5. Click on features to view detailed information
  6. Use mouse scroll to zoom, click-drag to pan

3.3 Searching Functional Annotations

  1. Navigate to Tools → Functional Annotation
  2. Select a Species from the dropdown
  3. Enter genomic coordinates:
    • Chromosome: Enter chromosome number (e.g., 3)
    • Start Position: Start of region in base pairs
    • End Position: End of region in base pairs
  4. Click "Add More" to search multiple regions simultaneously
  5. Select which annotation columns to display using checkboxes
  6. Click "Search" to retrieve annotations
  7. Export results using the download button

Tip: For better performance, limit your search region to 1-2 Mb at a time.

3.4 Downloading Reference Genomes

  1. Navigate to Tools → Genomes
  2. Browse the species cards to find your organism of interest
  3. Each species card provides three download options:
    • Download Genome: Reference FASTA sequence
    • Download GFF: Gene annotations
    • Download Protein: Predicted protein sequences
  4. Click the button to start download (files are gzip compressed)

4. Database Schema

4.1 Core QTL Tables

The database uses a normalized relational structure to organize QTL data efficiently:

Entity Relationship Overview

View the interactive schema diagram for a visual representation of all database tables and their relationships:

📊 Open Interactive Schema Diagram

species

Stores cucurbit species information.

FieldTypeDescription
species_idINT (PK)Primary key
species_nameVARCHAR(100)Scientific name (e.g., "Cucumis sativus")

traits

Major trait categories.

FieldTypeDescription
trait_idINT (PK)Primary key
trait_nameVARCHAR(100)Category name (e.g., "Fruit Quality")

sub_traits

Specific traits within categories.

FieldTypeDescription
sub_trait_idINT (PK)Primary key
trait_idINT (FK)Foreign key to traits
sub_trait_nameVARCHAR(100)Specific trait (e.g., "Flowering time")

parameters

Measurable parameters for each sub-trait.

FieldTypeDescription
parameter_idINT (PK)Primary key
sub_trait_idINT (FK)Foreign key to sub_traits
parameter_nameVARCHAR(255)Parameter (e.g., "Days to anthesis")

qtls

Main QTL data table containing all QTL records.

FieldTypeDescription
qtl_idINT (PK)Primary key
species_idINT (FK)Foreign key to species
trait_idINT (FK)Foreign key to traits
sub_trait_idINT (FK)Foreign key to sub_traits
parameter_idINT (FK)Foreign key to parameters
cross_idINT (FK)Foreign key to crosses
population_type_idINT (FK)Foreign key to population_types
reference_idINT (FK)Foreign key to reference_list
methodVARCHAR(50)Mapping method (e.g., "CIM")
qtl_nameVARCHAR(100)QTL identifier (e.g., "fl3.1")
linkage_groupVARCHAR(20)Chromosome/linkage group
position_intervalVARCHAR(50)Map position (cM or bp)
associated_markerTEXTFlanking or peak markers
lodVARCHAR(50)LOD score
pve_r2VARCHAR(50)Phenotypic variance explained

4.2 Contributor System Tables

Tables supporting the data submission and approval workflow:

contributors

Registered data contributors/curators.

FieldTypeDescription
contributor_idINT (PK)Primary key
emailVARCHAR(100)Unique email address
password_hashVARCHAR(255)Encrypted password
full_nameVARCHAR(100)Contributor's full name
institutionVARCHAR(255)Affiliated institution
orcid_idVARCHAR(50)ORCID identifier (optional)
statusENUMpending, approved, rejected, suspended
approved_byINT (FK)Admin who approved the account

data_uploads

Tracks all data upload submissions.

FieldTypeDescription
upload_idINT (PK)Primary key
contributor_idINT (FK)Who uploaded the data
file_nameVARCHAR(255)Stored file name
original_file_nameVARCHAR(255)Original uploaded filename
upload_typeENUMnew_data or update_data
row_countINTNumber of data rows
statusENUMpending, approved, rejected, processing
reviewed_byINT (FK)Admin who reviewed

staged_qtl_data

Temporary storage for uploaded QTL data awaiting approval.

FieldTypeDescription
staged_idINT (PK)Primary key
upload_idINT (FK)Related upload record
row_numberINTRow number in original CSV
species_nameVARCHAR(100)Species (raw text)
trait_nameVARCHAR(100)Trait category (raw text)
qtl_nameVARCHAR(100)QTL identifier
authorsTEXTExtracted from Reference
publication_yearINTExtracted from Reference
validation_statusENUMvalid, warning, error

admin_users

Administrative users with approval permissions.

FieldTypeDescription
admin_idINT (PK)Primary key
usernameVARCHAR(50)Login username
password_hashVARCHAR(255)Encrypted password
emailVARCHAR(100)Admin email
full_nameVARCHAR(100)Display name
is_activeTINYINTAccount status

4.3 Genomic Annotation Table

genomic_sequences

Stores functional annotation data for genes and genomic features.

FieldTypeDescription
idINT (PK)Primary key
genomic_speciesVARCHAR(255)Species name
sequenceVARCHAR(255)Sequence identifier
chromosomeINTChromosome number
start_positionBIGINTStart position (bp)
end_positionBIGINTEnd position (bp)
strandVARCHAR(10)Strand (+/-)
featureVARCHAR(255)Feature type (gene, mRNA, etc.)
nameVARCHAR(255)Gene/feature name
symbolVARCHAR(255)Gene symbol
go_idsTEXTGene Ontology IDs
go_namesTEXTGene Ontology terms
hit_descTEXTBLAST hit description
e_value_numericVARCHAR(50)E-value
similarityVARCHAR(50)Sequence similarity %
pathways_idsTEXTKEGG pathway IDs

5. Technical Implementation

5.1 Technologies Used

ComponentTechnologyPurpose
BackendPHP 8.xServer-side logic and API endpoints
DatabaseMySQL 8.xData storage and queries
FrontendHTML5, CSS3, JavaScriptUser interface
CSS FrameworkBootstrap 5, Tailwind CSSResponsive styling
Genome BrowserJBrowse2Interactive genome visualization
Data ExportPhpSpreadsheetExcel file generation
Data ImportPython (pandas, SQLAlchemy)CSV parsing and database loading

Database Indexes

Optimized indexes for common query patterns:

-- QTL table indexes
CREATE INDEX idx_species_trait ON qtls(species_id, trait_id);
CREATE INDEX idx_qtl_name ON qtls(qtl_name);
CREATE INDEX idx_linkage_group ON qtls(linkage_group);

-- Reference table indexes  
CREATE INDEX idx_reference_year ON reference_list(publication_year);
CREATE INDEX idx_reference_authors ON reference_list(authors(255));

-- Contributor system indexes
CREATE INDEX idx_contributor_status ON contributors(status);
CREATE INDEX idx_contributor_email ON contributors(email);
CREATE INDEX idx_upload_status ON data_uploads(status);
CREATE INDEX idx_upload_contributor ON data_uploads(contributor_id);
CREATE INDEX idx_staged_upload ON staged_qtl_data(upload_id);

-- Genomic sequences indexes
CREATE INDEX idx_genomic_species ON genomic_sequences(genomic_species);
CREATE INDEX idx_genomic_chromosome ON genomic_sequences(chromosome);
CREATE INDEX idx_genomic_position ON genomic_sequences(start_position, end_position);

5.2 Query Examples

Example 1: Find QTLs for Cucumber Fruit Length

SELECT q.qtl_name, q.linkage_group, q.position_interval, 
       q.lod, q.pve_r2, r.authors, r.publication_year
FROM qtls q
JOIN species s ON q.species_id = s.species_id
JOIN sub_traits st ON q.sub_trait_id = st.sub_trait_id
JOIN reference_list r ON q.reference_id = r.reference_id
WHERE s.species_name = 'Cucumis sativus'
  AND st.sub_trait_name LIKE '%fruit length%';

Example 2: Get Annotations in a Genomic Region

SELECT sequence, feature, name, symbol, 
       start_position, end_position, strand,
       go_ids, go_names, hit_desc
FROM genomic_sequences
WHERE genomic_species = 'Cucumis sativus'
  AND chromosome = 3
  AND start_position >= 10000000
  AND end_position <= 15000000
ORDER BY start_position;

Example 3: Count QTLs by Species

SELECT s.species_name, COUNT(*) as qtl_count
FROM qtls q
JOIN species s ON q.species_id = s.species_id
GROUP BY s.species_name
ORDER BY qtl_count DESC;

6. Available Cucurbit Species

CucurQTL currently includes data for 10 economically important cucurbit species:

Common Name Scientific Name Genome Version
🥒 Cucumber Cucumis sativus 9930_V3
🍉 Watermelon Citrullus lanatus 97103_V2.5
🍈 Melon Cucumis melo AY_V1
🎃 Pumpkin Cucurbita moschata Rifu
🎃 Winter Squash Cucurbita maxima Rimu
🥒 Zucchini Cucurbita pepo mu-cu-16_V2
🫛 Bottle Gourd Lagenaria siceraria Hangzhou Gourd_V1
🥬 Bitter Gourd Momordica charantia OHB3-1_V2
🥒 Ridged Gourd Luffa acutangula AG-4
🧽 Sponge Gourd Luffa aegyptiaca P93075

7. Glossary of Terms

TermDefinition
QTL Quantitative Trait Locus - a genomic region associated with variation in a quantitative trait
LOD Score Logarithm of Odds - statistical measure of linkage; LOD ≥ 3 is typically significant
PVE / R² Phenotypic Variance Explained - percentage of trait variation explained by a QTL
Linkage Group A group of genes that tend to be inherited together; corresponds to chromosomes
Marker A DNA sequence with known genomic location used to track inheritance patterns
CIM Composite Interval Mapping - a statistical method for QTL detection
RIL Recombinant Inbred Line - a population derived from repeated selfing of F2 plants
F2 Second filial generation - offspring from F1 hybrid self-fertilization
MAS Marker-Assisted Selection - using molecular markers to select for desired traits
GFF General Feature Format - standard file format for genomic annotations
GO Gene Ontology - standardized vocabulary describing gene functions
KEGG Kyoto Encyclopedia of Genes and Genomes - pathway and function database

8. Future Development

Planned enhancements for upcoming releases:

🔗 External Database Links

Integration with NCBI, UniProt, and expression databases

📊 Interactive Visualizations

QTL maps, comparative views, and chromosome plots

✅ Data Submission (Implemented)

Contributor system for researchers to submit QTL data with admin approval workflow

🔌 REST API

Programmatic access for bioinformatics pipelines

Recent Updates

  • Contributor Portal: Full registration and data submission system
  • Admin Dashboard: Comprehensive management interface for approvals
  • CSV Upload: Flexible header mapping with auto-extraction of authors/year
  • Email Notifications: Automated approval notifications via EmailJS
  • Direct Admin Upload: Admins can upload QTL data directly without staging

9. Contact Information

For questions, feedback, or data submission inquiries:

📧 Contact Us

Institution: IASRI - Indian Agricultural Statistics Research Institute (ICAR)
Address: Library Avenue, Pusa, New Delhi-110012, India
Email: admin@cucurqtl.org