CucurQTL Documentation
Welcome to the comprehensive documentation for CucurQTL - a specialized database for Quantitative Trait Loci (QTL) data across cucurbit species. This documentation will guide you through all features and functionalities of the database.
Quick Start: Navigate to QTL Search to start exploring QTL data, or visit Genomes to download reference sequences.
1.1 What are QTLs?
A Quantitative Trait Locus (QTL) is a region of DNA associated with a particular phenotypic trait that varies in degree. Unlike simple Mendelian traits controlled by single genes, quantitative traits (like fruit size, yield, or disease resistance) are influenced by multiple genes and environmental factors.
QTL analysis helps researchers:
- Identify genomic regions controlling important agricultural traits
- Understand the genetic architecture of complex traits
- Develop markers for marker-assisted selection (MAS)
- Accelerate crop breeding programs
1.2 Database Purpose
CucurQTL serves as a comprehensive repository for QTL data across multiple cucurbit species, including cucumber, watermelon, melon, pumpkin, and other economically important gourds. The database aims to:
📊 Centralize Data
Aggregate QTL information from diverse genetic studies into a single, searchable resource
🔬 Standardize Format
Provide consistent formatting for QTL data across species and studies
🧬 Enable Research
Facilitate comparative genomics and support breeding applications
📥 Share Resources
Provide downloadable genome sequences and annotation files
1.3 Key Features
- QTL Search: Filter and search QTLs by species, trait, sub-trait, and parameters
- JBrowse2 Integration: Interactive genome browser for visualizing genomic features
- Functional Annotation: Search gene annotations within specific genomic regions
- Genome Downloads: Access reference genomes, GFF annotations, and protein sequences
- Data Export: Download search results in CSV and Excel formats
- Contributor System: Submit QTL data for review and approval by administrators
- Admin Dashboard: Comprehensive admin panel for data management and user approvals
2. Application Features
2.1 QTL Database Search
The QTL search interface provides powerful filtering capabilities to find relevant QTL data:
Search Filters
- Species: Select from 10 cucurbit species (Cucumber, Watermelon, Melon, etc.)
- Trait Category: Major trait categories (Fruit Quality, Disease Resistance, Vegetative, etc.)
- Sub-trait: Specific traits within categories (Fruit Length, Powdery Mildew Resistance)
- Parameter: Measured parameters (Brix content, Length in cm, etc.)
Results Display
Search results include:
- QTL name and associated linkage group
- Position interval on the chromosome
- LOD score and phenotypic variance explained (PVE/R²)
- Associated markers and mapping method
- Reference publication with DOI links
Tip: Use the "Download" button to export your search results as a CSV file for further analysis.
2.2 JBrowse2 Genome Browser
CucurQTL integrates JBrowse2, a modern genome browser that allows interactive visualization of genomic data:
Available Tracks
- Reference Sequence: View nucleotide sequences at any zoom level
- Gene Annotations: Browse gene models, exons, and UTRs
- GFF Tracks: Visualize genomic features from annotation files
Navigation Features
- Search by gene name or genomic coordinates
- Zoom in/out with mouse scroll or controls
- Pan by clicking and dragging
- Switch between species assemblies
2.3 Functional Annotation Search
The Functional Annotation tool allows you to search for genes and genomic features within specific regions:
Search Parameters
- Species: Select the species of interest
- Chromosome: Specify the chromosome number
- Start Position: Beginning of the genomic region (bp)
- End Position: End of the genomic region (bp)
Available Annotation Fields
| Category | Fields |
|---|---|
| Basic Info | Sequence ID, Feature type, Gene name, Symbol |
| Position | Chromosome, Start position, End position, Strand |
| Gene Ontology | GO IDs, GO names, GO Cellular Component, GO Molecular Function |
| Functional | Enzyme codes, KEGG pathways, InterPro IDs |
| BLAST Results | Hit description, E-value, Similarity, Bit score |
2.4 Reference Genome Resources
Download reference genomes and annotation files for all 10 cucurbit species:
Available Downloads
- Genome FASTA: Complete reference genome sequences (.fa.gz)
- GFF Annotations: Gene and feature annotations (.gff3.gz)
- Protein Sequences: Predicted protein sequences (.fa.gz)
Note: All genome files are compressed with gzip. Use tools like gunzip or 7-Zip to decompress.
2.5 Contributor System
CucurQTL features a comprehensive data submission system allowing researchers to contribute QTL data:
For Contributors
- Registration: Create an account with your institutional details and ORCID ID
- CSV Upload: Submit QTL data in standardized CSV format
- Upload Tracking: Monitor the status of your submissions
- Email Notifications: Receive updates when your data is approved
CSV Format Requirements
Upload files must include these columns:
Species, Trait, Sub-Trait, Parameter, Cross, Population type,
Method/Model, Qtl Name, Linkage Group (LG), Position/interval (cM/Mb),
Associated Marker, LOD, PVE/R2, Reference, Doi
Tip: Required fields are Species, Trait, and Qtl Name. Authors and publication year are auto-extracted from the Reference column.
Admin Review Process
- New registrations require admin approval before access is granted
- Uploaded data is staged for review before being added to the main database
- Admins can approve, reject, or request revisions
- Approved data is automatically imported with proper foreign key relationships
3. User Guide
3.1 Searching QTLs
- Navigate to QTL from the main navigation menu
- Use the cascading dropdown filters:
- First, select a Species (e.g., Cucumber)
- Choose a Trait Category (e.g., Fruit Quality)
- Select a Sub-trait (e.g., Fruit Length)
- Optionally filter by Parameter
- Click "Search" to retrieve matching QTLs
- Browse results in the interactive data table
- Click "Download CSV" or "Download Excel" to export results
3.2 Using JBrowse2 Genome Browser
- Navigate to Tools → JBrowse2 from the menu
- Select a species/assembly from the dropdown
- Enter a genomic location in the search box (e.g.,
Chr1:1000000-2000000) - Use the track selector to enable/disable annotation tracks
- Click on features to view detailed information
- Use mouse scroll to zoom, click-drag to pan
3.3 Searching Functional Annotations
- Navigate to Tools → Functional Annotation
- Select a Species from the dropdown
- Enter genomic coordinates:
- Chromosome: Enter chromosome number (e.g., 3)
- Start Position: Start of region in base pairs
- End Position: End of region in base pairs
- Click "Add More" to search multiple regions simultaneously
- Select which annotation columns to display using checkboxes
- Click "Search" to retrieve annotations
- Export results using the download button
Tip: For better performance, limit your search region to 1-2 Mb at a time.
3.4 Downloading Reference Genomes
- Navigate to Tools → Genomes
- Browse the species cards to find your organism of interest
- Each species card provides three download options:
- Download Genome: Reference FASTA sequence
- Download GFF: Gene annotations
- Download Protein: Predicted protein sequences
- Click the button to start download (files are gzip compressed)
4. Database Schema
4.1 Core QTL Tables
The database uses a normalized relational structure to organize QTL data efficiently:
Entity Relationship Overview
View the interactive schema diagram for a visual representation of all database tables and their relationships:
📊 Open Interactive Schema Diagramspecies
Stores cucurbit species information.
| Field | Type | Description |
|---|---|---|
| species_id | INT (PK) | Primary key |
| species_name | VARCHAR(100) | Scientific name (e.g., "Cucumis sativus") |
traits
Major trait categories.
| Field | Type | Description |
|---|---|---|
| trait_id | INT (PK) | Primary key |
| trait_name | VARCHAR(100) | Category name (e.g., "Fruit Quality") |
sub_traits
Specific traits within categories.
| Field | Type | Description |
|---|---|---|
| sub_trait_id | INT (PK) | Primary key |
| trait_id | INT (FK) | Foreign key to traits |
| sub_trait_name | VARCHAR(100) | Specific trait (e.g., "Flowering time") |
parameters
Measurable parameters for each sub-trait.
| Field | Type | Description |
|---|---|---|
| parameter_id | INT (PK) | Primary key |
| sub_trait_id | INT (FK) | Foreign key to sub_traits |
| parameter_name | VARCHAR(255) | Parameter (e.g., "Days to anthesis") |
qtls
Main QTL data table containing all QTL records.
| Field | Type | Description |
|---|---|---|
| qtl_id | INT (PK) | Primary key |
| species_id | INT (FK) | Foreign key to species |
| trait_id | INT (FK) | Foreign key to traits |
| sub_trait_id | INT (FK) | Foreign key to sub_traits |
| parameter_id | INT (FK) | Foreign key to parameters |
| cross_id | INT (FK) | Foreign key to crosses |
| population_type_id | INT (FK) | Foreign key to population_types |
| reference_id | INT (FK) | Foreign key to reference_list |
| method | VARCHAR(50) | Mapping method (e.g., "CIM") |
| qtl_name | VARCHAR(100) | QTL identifier (e.g., "fl3.1") |
| linkage_group | VARCHAR(20) | Chromosome/linkage group |
| position_interval | VARCHAR(50) | Map position (cM or bp) |
| associated_marker | TEXT | Flanking or peak markers |
| lod | VARCHAR(50) | LOD score |
| pve_r2 | VARCHAR(50) | Phenotypic variance explained |
4.2 Contributor System Tables
Tables supporting the data submission and approval workflow:
contributors
Registered data contributors/curators.
| Field | Type | Description |
|---|---|---|
| contributor_id | INT (PK) | Primary key |
| VARCHAR(100) | Unique email address | |
| password_hash | VARCHAR(255) | Encrypted password |
| full_name | VARCHAR(100) | Contributor's full name |
| institution | VARCHAR(255) | Affiliated institution |
| orcid_id | VARCHAR(50) | ORCID identifier (optional) |
| status | ENUM | pending, approved, rejected, suspended |
| approved_by | INT (FK) | Admin who approved the account |
data_uploads
Tracks all data upload submissions.
| Field | Type | Description |
|---|---|---|
| upload_id | INT (PK) | Primary key |
| contributor_id | INT (FK) | Who uploaded the data |
| file_name | VARCHAR(255) | Stored file name |
| original_file_name | VARCHAR(255) | Original uploaded filename |
| upload_type | ENUM | new_data or update_data |
| row_count | INT | Number of data rows |
| status | ENUM | pending, approved, rejected, processing |
| reviewed_by | INT (FK) | Admin who reviewed |
staged_qtl_data
Temporary storage for uploaded QTL data awaiting approval.
| Field | Type | Description |
|---|---|---|
| staged_id | INT (PK) | Primary key |
| upload_id | INT (FK) | Related upload record |
| row_number | INT | Row number in original CSV |
| species_name | VARCHAR(100) | Species (raw text) |
| trait_name | VARCHAR(100) | Trait category (raw text) |
| qtl_name | VARCHAR(100) | QTL identifier |
| authors | TEXT | Extracted from Reference |
| publication_year | INT | Extracted from Reference |
| validation_status | ENUM | valid, warning, error |
admin_users
Administrative users with approval permissions.
| Field | Type | Description |
|---|---|---|
| admin_id | INT (PK) | Primary key |
| username | VARCHAR(50) | Login username |
| password_hash | VARCHAR(255) | Encrypted password |
| VARCHAR(100) | Admin email | |
| full_name | VARCHAR(100) | Display name |
| is_active | TINYINT | Account status |
4.3 Genomic Annotation Table
genomic_sequences
Stores functional annotation data for genes and genomic features.
| Field | Type | Description |
|---|---|---|
| id | INT (PK) | Primary key |
| genomic_species | VARCHAR(255) | Species name |
| sequence | VARCHAR(255) | Sequence identifier |
| chromosome | INT | Chromosome number |
| start_position | BIGINT | Start position (bp) |
| end_position | BIGINT | End position (bp) |
| strand | VARCHAR(10) | Strand (+/-) |
| feature | VARCHAR(255) | Feature type (gene, mRNA, etc.) |
| name | VARCHAR(255) | Gene/feature name |
| symbol | VARCHAR(255) | Gene symbol |
| go_ids | TEXT | Gene Ontology IDs |
| go_names | TEXT | Gene Ontology terms |
| hit_desc | TEXT | BLAST hit description |
| e_value_numeric | VARCHAR(50) | E-value |
| similarity | VARCHAR(50) | Sequence similarity % |
| pathways_ids | TEXT | KEGG pathway IDs |
5. Technical Implementation
5.1 Technologies Used
| Component | Technology | Purpose |
|---|---|---|
| Backend | PHP 8.x | Server-side logic and API endpoints |
| Database | MySQL 8.x | Data storage and queries |
| Frontend | HTML5, CSS3, JavaScript | User interface |
| CSS Framework | Bootstrap 5, Tailwind CSS | Responsive styling |
| Genome Browser | JBrowse2 | Interactive genome visualization |
| Data Export | PhpSpreadsheet | Excel file generation |
| Data Import | Python (pandas, SQLAlchemy) | CSV parsing and database loading |
Database Indexes
Optimized indexes for common query patterns:
-- QTL table indexes
CREATE INDEX idx_species_trait ON qtls(species_id, trait_id);
CREATE INDEX idx_qtl_name ON qtls(qtl_name);
CREATE INDEX idx_linkage_group ON qtls(linkage_group);
-- Reference table indexes
CREATE INDEX idx_reference_year ON reference_list(publication_year);
CREATE INDEX idx_reference_authors ON reference_list(authors(255));
-- Contributor system indexes
CREATE INDEX idx_contributor_status ON contributors(status);
CREATE INDEX idx_contributor_email ON contributors(email);
CREATE INDEX idx_upload_status ON data_uploads(status);
CREATE INDEX idx_upload_contributor ON data_uploads(contributor_id);
CREATE INDEX idx_staged_upload ON staged_qtl_data(upload_id);
-- Genomic sequences indexes
CREATE INDEX idx_genomic_species ON genomic_sequences(genomic_species);
CREATE INDEX idx_genomic_chromosome ON genomic_sequences(chromosome);
CREATE INDEX idx_genomic_position ON genomic_sequences(start_position, end_position);
5.2 Query Examples
Example 1: Find QTLs for Cucumber Fruit Length
SELECT q.qtl_name, q.linkage_group, q.position_interval,
q.lod, q.pve_r2, r.authors, r.publication_year
FROM qtls q
JOIN species s ON q.species_id = s.species_id
JOIN sub_traits st ON q.sub_trait_id = st.sub_trait_id
JOIN reference_list r ON q.reference_id = r.reference_id
WHERE s.species_name = 'Cucumis sativus'
AND st.sub_trait_name LIKE '%fruit length%';
Example 2: Get Annotations in a Genomic Region
SELECT sequence, feature, name, symbol,
start_position, end_position, strand,
go_ids, go_names, hit_desc
FROM genomic_sequences
WHERE genomic_species = 'Cucumis sativus'
AND chromosome = 3
AND start_position >= 10000000
AND end_position <= 15000000
ORDER BY start_position;
Example 3: Count QTLs by Species
SELECT s.species_name, COUNT(*) as qtl_count
FROM qtls q
JOIN species s ON q.species_id = s.species_id
GROUP BY s.species_name
ORDER BY qtl_count DESC;
6. Available Cucurbit Species
CucurQTL currently includes data for 10 economically important cucurbit species:
| Common Name | Scientific Name | Genome Version |
|---|---|---|
| 🥒 Cucumber | Cucumis sativus | 9930_V3 |
| 🍉 Watermelon | Citrullus lanatus | 97103_V2.5 |
| 🍈 Melon | Cucumis melo | AY_V1 |
| 🎃 Pumpkin | Cucurbita moschata | Rifu |
| 🎃 Winter Squash | Cucurbita maxima | Rimu |
| 🥒 Zucchini | Cucurbita pepo | mu-cu-16_V2 |
| 🫛 Bottle Gourd | Lagenaria siceraria | Hangzhou Gourd_V1 |
| 🥬 Bitter Gourd | Momordica charantia | OHB3-1_V2 |
| 🥒 Ridged Gourd | Luffa acutangula | AG-4 |
| 🧽 Sponge Gourd | Luffa aegyptiaca | P93075 |
7. Glossary of Terms
| Term | Definition |
|---|---|
| QTL | Quantitative Trait Locus - a genomic region associated with variation in a quantitative trait |
| LOD Score | Logarithm of Odds - statistical measure of linkage; LOD ≥ 3 is typically significant |
| PVE / R² | Phenotypic Variance Explained - percentage of trait variation explained by a QTL |
| Linkage Group | A group of genes that tend to be inherited together; corresponds to chromosomes |
| Marker | A DNA sequence with known genomic location used to track inheritance patterns |
| CIM | Composite Interval Mapping - a statistical method for QTL detection |
| RIL | Recombinant Inbred Line - a population derived from repeated selfing of F2 plants |
| F2 | Second filial generation - offspring from F1 hybrid self-fertilization |
| MAS | Marker-Assisted Selection - using molecular markers to select for desired traits |
| GFF | General Feature Format - standard file format for genomic annotations |
| GO | Gene Ontology - standardized vocabulary describing gene functions |
| KEGG | Kyoto Encyclopedia of Genes and Genomes - pathway and function database |
8. Future Development
Planned enhancements for upcoming releases:
🔗 External Database Links
Integration with NCBI, UniProt, and expression databases
📊 Interactive Visualizations
QTL maps, comparative views, and chromosome plots
✅ Data Submission (Implemented)
Contributor system for researchers to submit QTL data with admin approval workflow
🔌 REST API
Programmatic access for bioinformatics pipelines
Recent Updates
- Contributor Portal: Full registration and data submission system
- Admin Dashboard: Comprehensive management interface for approvals
- CSV Upload: Flexible header mapping with auto-extraction of authors/year
- Email Notifications: Automated approval notifications via EmailJS
- Direct Admin Upload: Admins can upload QTL data directly without staging
9. Contact Information
For questions, feedback, or data submission inquiries:
📧 Contact Us
Institution: IASRI - Indian Agricultural Statistics Research Institute (ICAR)
Address: Library Avenue, Pusa, New Delhi-110012, India
Email: admin@cucurqtl.org