Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA is coming up as a gene name #51

Open
cnobles opened this issue Jan 27, 2016 · 10 comments
Open

NA is coming up as a gene name #51

cnobles opened this issue Jan 27, 2016 · 10 comments

Comments

@cnobles
Copy link

cnobles commented Jan 27, 2016

No description provided.

@anatolydryga
Copy link
Collaborator

@cnobles for what dataset(patient) and where in the report?

@cnobles
Copy link
Author

cnobles commented Jan 27, 2016

If you want to test, check patient "p04409-10". NA appears in the barcharts, sharing heatmap, and wordclouds.

@anatolydryga
Copy link
Collaborator

indeed it is, I will work on it.

@anatolydryga
Copy link
Collaborator

we have reads mapped to chrM
very often to position 12460

but refSeq gene tables does not have chrM

so when getNearestFeature is called for
sites with chrM NA is produced

solutions:
filter chrM
???

@cnobles
Copy link
Author

cnobles commented Feb 2, 2016

SELECT DISTINCT samples.miseqid FROM samples INNER JOIN sites ON samples.sampleID = sites.sampleID WHERE chr = "chrM"

All the runs containing sites which match to chrM sites.

@cnobles
Copy link
Author

cnobles commented Feb 3, 2016

SELECT DISTINCT samples.miseqid, COUNT(miseqid) AS chrM_Hits FROM samples INNER JOIN sites ON samples.sampleID = sites.sampleID WHERE chr = "chrM" GROUP BY miseqid

How many chrM hits per run. We should focus on grabbing the reads from the February runs.

@chasberry
Copy link

Is it biologically possible to have integration events in the mitochondria?

If not where do these "sites" come from?

If so, NA or Inf seem like plausible choices for constructing a dataset with filtering done downstream.

@cnobles
Copy link
Author

cnobles commented Feb 3, 2016

There currently isn't much support for mitochondrial integration, if any. I'm currently looking into where these reads are coming from and what there origins may be. I'll be able to update everyone soon on their legitimacy, or if they are contaminants / artifacts.

@cnobles
Copy link
Author

cnobles commented Feb 4, 2016

Update: Data suggests a large part of the problem is the LTRbit filtering is too permissive. We looked back in the intSiteLogic.R code and found which line should be changed. We are currently adjusting the parameters to be more stringent and rerunning some of the runs to check performance.

@cnobles
Copy link
Author

cnobles commented Feb 4, 2016

Another part of the problem is reads that map to chrM but also map (at slightly lower quality) to chr#. This can occur as the DNA of mitochondria has been slowly assimilated into the eukaryotic genome over time, and therefore there are regions of the human genome which highly resemble regions of the mitochondrial genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants